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Notation 



(a..b), [a..b] 
(...) 

v,w,u 

v,w, 0, Ov 
B,D, |3,6 

RepB (v] 

[S] 

M©N 

v = w 

H,G 
t, s 
T,S 

|T| 



real numbers, reals greater than 0, n-tuples of reals 
natural numbers: {0, 1 , 2, . . .}, complex numbers 
interval (open, closed) of reals between a and b 
sequence; like a set but order matters 
vector spaces 

vectors, zero vector, zero vector of V 

bases, basis vectors 

standard basis for 

matrix representing the vector 

set of degree n polynomials 

set of n X m matrices 

span of the set S 

direct sum of subspaces 

isomorphic spaces 

homomorphisms, linear maps 

matrices 

transformations; maps from a space to itself 

square matrices 

matrix representing the map h. 

matrix entry from row i, column j 

zero matrix, identity matrix 

determinant of the matrix T 

range space and null space of the map h 

generalized range space and null space 



Lower case Greek alphabet, with pronounciation 
character name character name 



a 


alpha AL-fuh 


y 


nu NEW 


P 


beta BAY-tuh 




xi KSIGH 


Y 


gamma GAM-muh 





omicron OM-uh-CRON 


6 


delta DEL-tuh 


7t 


pi PIE 


e 


epsilon EP-suh-lon 


P 


rho ROW 


C 


zeta ZAY-tuh 


a 


sigma SIG-muh 




eta AY-tuh 


T 


tau TOW as in cow 


e 


theta THAY-tuh 


V 


upsilon OOP-suh-LON 




iota eye-OH-tuh 




phi FEE, or FI as in hi 


K 


kappa KAP-uh 


X 


chi KI as in hi 


A 


lambda LAM-duh 


A> 


psi SIGH, or PSIGH 




mu MEW 


O) 


omega oh-MAY-guh 



Preface 



This book helps students to master the material of a standard US undergraduate 
first course in Linear Algebra. 

The material is standard in that the subjects covered are Gaussian reduction, 
vector spaces, linear maps, determinants, and eigenvalues and eigenvectors. 
Another standard is book's audience: sophomores or juniors, usually with 
a background of at least one semester of calculus. The help that it gives to 
students comes from taking a developmental approach — this book's presentation 
emphasizes motivation and naturalness, using many examples as well as extensive 
and careful exercises. 

The developmental approach is what most recommends this book so I will 
elaborate. Courses at the beginning of a mathematics program focus less on 
theory and more on calculating. Later courses ask for mathematical maturity: the 
ability to follow different types of arguments, a familiarity with the themes that 
underlie many mathematical investigations such as elementary set and function 
facts, and a capacity for some independent reading and thinking. Some programs 
have a separate course devoted to developing maturity and some do not. In 
either case, a Linear Algebra course is an ideal spot to work on this transition. 
It comes early in a program so that progress made here pays off later but also 
comes late enough that students are serious about mathematics. The material 
is accessible, coherent, and elegant. There are a variety of argument styles, 
including direct proofs, proofs by contradiction, and proofs by induction. And, 
examples are plentiful. 

Helping readers start the transition to being serious students of mathematics 
requires taking the mathematics seriously so all of the results here are proved. 
On the other hand, we cannot assume that students have already arrived and so 
in contrast with more advanced texts this book is filled with examples, often 
quite detailed. 

Some books that assume a not- yet-sophisticated reader begin with extensive 
computations of linear systems, matrix multiplications, and determinants. Then, 
when vector spaces and linear maps finally appear and definitions and proofs 
start, the abrupt change can bring students to an abrupt stop. While this book 



begins with linear reduction, from the start we do more than compute. The 
first chapter includes proofs showing that linear reduction gives a correct and 
complete solution set. Then, with the linear systems work as motivation so that 
the study of linear combinations is natural, the second chapter starts with the 
definition of a real vector space. In the schedule below this happens at the start 
of the third week. 

Another example of this book's emphasis on motivation and naturalness 
is that the third chapter on linear maps does not begin with the definition of 
homomorphism. Instead it begins with the definition of isomorphism, which is 
natural: students themselves observe that some spaces are "the same" as others. 
After that, the next section takes the reasonable step of isolating the operation- 
preservation idea to define homomorphism. This approach loses mathematical 
slickness but it is a good trade because it gives to students a large gain in 
sensibility. 

A student progresses most in mathematics while doing exercises. In this 
book problem sets start with simple checks and range up to reasonably involved 
proofs. Since instructors usually assign about a dozen exercises I have tried to 
put two dozen in each set, thereby giving a selection. There are even a few that 
are puzzles taken from various journals, competitions, or problems collections. 
These are marked with a '?' and as part of the fun I have retained the original 
wording as much as possible. 

That is, as with the rest of the book the exercises are aimed to both build 
an ability at, and help students experience the pleasure of, doing mathematics. 
Students should see how the ideas arise and should be able to picture themselves 
doing the same type of work. 

Applications and computers. The point of view taken here, that students should 
think of Linear Algebra as about vector spaces and linear maps, is not taken to 
the complete exclusion of others. Applications and computing are interesting 
and vital aspects of the subject. Consequently each of this book's chapters closes 
with a few topics in those areas. They are brief enough that an instructor can do 
one in a day's class or can assign them as independent or small-group projects. 
Most simply give a reader a taste of the subject, discuss how Linear Algebra 
comes in, point to some further reading, and give a few exercises. Whether they 
figure formally in a course or not these help readers see for themselves that 
Linear Algebra is a tool that a professional must master. 

Availability. This book is freely available. In particular, instructors can print 
copies for students and sell them out of a college bookstore. See http: //joshua. 
smcvt . edu/linearalgebra for the license details. That page also has the latest 
version, exercise answers, beamer slides, and M'eX source. 

A text is a large and complex project. One of the lessons of software 
development is that such a project will have errors. I welcome bug reports and I 
periodically issue revisions. My contact information is on the web page. 

- ii - 
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Monday 


Wednesday 


Friday 
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One.I.l 


One.I.l, 2 


One.1.2, 3 
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One.1.3 


One.III.l 


One.III.2 
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Two.I.l 


Two.I.l, 2 


Two.1.2 
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Two. II. 1 
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Two.III.2 
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Two.III.2 


Two.III.2, 3 


Two.III.3 
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Three.I.l 
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Three.II.l 
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Three.II.l 
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Three.II.2 
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Three.III.2 


Three.IV.l, 2 


10 
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Three.IV.4 


Three.V.l 
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Three.V.2 


Four.I.l 


12 
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Four. 1. 2 


Four.III.l 


13 


Five.II.l 


-Thanksgiving break- 


14 


Five.II.l, 2 


Five.II.2 


Five.II.3 



This schedule supposes that you already know Section One. II, the elements of 
vectors. In the above course, in addition to the shown exams and to the final 
exam that is not shown, students must do take-home problem sets that include 
proofs. That is, the computations are important but so are the proofs. 

In the table of contents I have marked subsections as optional if some 
instructors will pass over them in favor of spending more time elsewhere. 

You might pick one or two topics that appeal to you from the end of each 
chapter. You'll get more from these if you have access to software for calculations. 
I recommend Sage, freely available from http: //sagemath.org. 

My main advice is: do many exercises. I have marked a good sample with 
/'s in the margin. Do not simply read some answers — you must actually try the 
problems and quite possibly struggle with some of them. For all of the exercises 
you must justify your answer either with a computation or with a proof. Be 
aware that few people can write correct proofs without training. Try to find a 
knowledgeable person to work with you on these. 
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Finally, a caution for all students, independent or not: I cannot overemphasize 
that the statement, "I understand the material but it is only that I have trouble 
with the problems" shows a misconception. Being able to do things with the 
ideas is their entire point. The quotes below express this sentiment admirably. 
They capture the essence of both the beauty and the power of mathematics and 
science in general, and of Linear Algebra in particular. (I took the liberty of 
formatting them as poetry). 

/ know of no better tactic 
than the illustration of exciting principles 
by well-chosen particulars. 

-Stephen Jay Gould 

If you really wish to learn 

then you must mount the machine 

and become acquainted with its tricks 
by actual trial. 

-Wilbur Wright 

Jim Hefferon 

Mathematics, Saint Michael's College 

Colchester, Vermont USA 05439 

http : // j oshua . smcvt . edu/linearalgebra 

2012-Feb-29 



Author's Note. Inventing a good exercise, one that enlightens as well as tests, 
is a creative act and hard work. The inventor deserves recognition. But texts 
have traditionally not given attributions for questions. 1 have changed that here 
where I was sure of the source. I would be glad to hear from anyone who can 
help me to correctly attribute others of the questions. 
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S^a^ite^ One 

Linear Systems 



I Solving Linear Systems 



Systems of linear equations are common in science and mathematics. These two 
examples from high school science [Onan] give a sense of how they arise. 

The first example is from Statics. Suppose that we have three objects, one 
with a mass known to be 2 kg and we want to find the unknown masses. Suppose 
further that experimentation with a meter stick produces these two balances. 



For the masses to balance we must have that the sum of moments on the left 
equals the sum of moments on the right, where the moment of an object is its 
mass times its distance from the balance point. That gives a system of two 
equations. 



The second example of a linear system is from Chemistry. We can mix, 
under controlled conditions, toluene C/Hg and nitric acid HNO3 to produce 
trinitrotoluene C/HsOgNs along with the byproduct water (conditions have 
to be very well controlled — trinitrotoluene is better known as TNT). In what 
proportion should we mix them? The number of atoms of each element present 
before the reaction 




1 5 



N- 25 -H 



40h.+ 15c = 100 



25c = 50 + 50h. 



xCzHs + yHNOa — > zC/HsOeNa + wHaO 
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must equal the number present afterward. Applying that in turn to the elements 
C, H, N, and O gives this system. 

7x = 7z 
8x+ ly = 5z + 2w 
^y = 3z 
Sy = 6z + 1w 

Both examples come down to solving a system of equations. In each system, 
the equations involve only the first power of each variable. This chapter shows 
how to solve any such system. 



1.1 Gauss's Method 

1.1 Definition A linear combination of xi , . . . , has the form 

QiXi + a2X2 + Q3X3 H h an'X-n 

where the numbers qi , . . . , Qn. G M are the combination's coefficients . A linear 
equation in the variables xi , . . . , x^ has the form aiX] + aaxa + asXs + • • • + 
O-n'X-n = d where d e M is the constant. 

An n-tuple (si , Si) • • ■ , s^) G is a solution of, or satisfies, that equation 
if substituting the numbers si , . . . , Sn for the variables gives a true statement: 
ai Si + azSz + • • • + QnSn = d. A system of linear equations 

aijxi + ai,2X2H h ai^n^n = di 

a2,lXi + a2,2X2H h a2, n^n = d2 

dm,! ''I + Qm,2X2 H h a^^^Xn = d^^ 

has the solution (si , S2, . . . , Sn) if that n-tuple is a solution of all of the equations 
in the system. 

1.2 Example The combination 3xi +2x2 of xi and X2 is linear. The combination 
3x| + 2sin(x2) is not linear, nor is 3xf +2x2. 

1.3 Example The ordered pair (—1,5) is a solution of this system. 

3xi + 2x2 — 7 

— Xi + X2 = 6 

In contrast, (5,-1) is not a solution. 

Finding the set of all solutions is solving the system. We don't need guesswork 
or good luck; there is an algorithm that always works. This algorithm is Gauss's 
Method (or Gaussian elimination or linear elimination). 
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1.4 Example To solve this system 

3x3=9 
xi + 5x2 ^ 2x3 — 2 
Ixi +2x2 =3 

we transform it, step by step, until it is in a form that we can easily solve. 

The first transformation rewrites the system by interchanging the first and 
third row. 

|xi +2x2 =3 

swap row 1 with row 3 _ _ _ 
> Xi + 5X2 — 2X3 = 2 

3x3=9 

The second transformation rescales the first row by multiplying both sides of 
the equation by 3. 

xi +6x2 =9 

multiply row 1 by 3 

> Xl + 5X2 — 2X3 = 2 

3x3=9 

The third transformation is the only nontrivial one in this example. We mentally 
multiply both sides of the first row by — 1 , mentally add that to the second row, 
and write the result in as the new second row. 

Xl +6x2 =9 

add —1 times row 1 to row 2 _ _ 

> -X2 - 2X3 = -7 

3x3= 9 

The point of these steps is that we've brought the system to a form where we can 
easily find the value of each variable. The bottom equation shows that X3 = 3. 
Substituting 3 for X3 in the middle equation shows that X2 = 1 . Substituting 
those two into the top equation gives that Xi =3. Thus the system has a unique 
solution; the solution set is { (3, 1 , 3) }. 

Most of this subsection and the next one consists of examples of solving 
linear systems by Gauss's Method. We will use it throughout the book. It is fast 
and easy. But before we do those examples we will first show that this Method 
is also safe in that it never loses solutions or picks up extraneous solutions. 

1.5 Theorem (Gauss's Method) If a linear system is changed to another by one of 
these operations 

(1) an equation is swapped with another 

(2) an equation has both sides multiplied by a nonzero constant 

(3) an equation is replaced by the sum of itself and a multiple of another 

then the two systems have the same set of solutions. 
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Each of the three Gauss's Method operations has a restriction. Multiplying 
a row by is not allowed because obviously that can change the solution set 
of the system. Similarly, adding a multiple of a row to itself is not allowed 
because adding —1 times the row to itself has the effect of multiplying the row 
by 0. Finally, we disallow swapping a row with itself to make some results in 
the fourth chapter easier to state and remember, and also because it's pointless. 

Proof We will cover the equation swap operation here. The other two cases 
are Exercise 31. 

Consider the swap of row i with row j. The tuple (si , . . . , Sn) satisfies the 
system before the swap if and only if substituting the values for the variables, 
the s's for the x's, gives a conjunction of true statements: ai jsi + ai^isi + 
1- Qi,nSn = di and . . . atjSi + at^isi H h aii,nSn = di and . . . Qj,iSi + 

aj,2S2 H + aj,nSn = dj and . . . a^,! Si + a^n.lSl H H Qm,nSn = dm- 

In a list of statements joined with 'and' we can rearrange the order of the 
statements. Thus this requirement is met if and only if ai j si + Qi ^zSz + • • • + 

ai.nSn = di and . . . aj,i si + aj^isi H h aj,nSn = dj and . . . Qij Si + at,2S2 + 

H ai,nSn = di and . . . s^ + a^aSi H H am,nSn = dm- This is exactly 

the requirement that (si , . . . , Sn) solves the system after the row swap. QED 

1.6 Definition The three operations from Theorem 1.5 are the elementary re- 
duction operations, or row operations, or Gaussian operations. They are 
swapping, multiplying by a scalar (or rescaling) , and row combination. 

When writing out the calculations, we will abbreviate 'row i' by 'pt'. For 
instance, we will denote a row combination operation by kpt + Pj, with the row 
that changes written second. To save writing we will often combine addition 
steps when they use the same p^; see the next example. 

1.7 Example Gauss's Method systematically applies the row operations to solve 
a system. Here is a typical case. 



We begin by using the first row to eliminate the 2x in the second row and the x 
in the third. To get rid of the 2x, we mentally multiply the entire first row by 
—2, add that to the second row, and write the result in as the new second row. 
To eliminate the x leading the third row, we multiply the first row by —1, add 
that to the third row, and write the result in as the new third row. 



x+ y =0 
2x — -y + 3z = 3 
X — 2^ — z — i 



2 pl + P2 

-Pl +P3 



X + 



y =0 

By + 3z = 3 
3i| — z = 3 



To finish we transform the second system into a third system, where the last 
equation involves only one unknown. We use the second row to eliminate y from 
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the third row. 

x+ y =0 
'^^''^ —3y + 3z = 3 
-4z = 

Now the system's solution is easy to find. The third row shows that z — 0. 
Substitute that back into the second row to get y — ~] and then substitute 
back into the first row to get x = 1 . 

1.8 Example For the Physics problem from the start of this chapter, Gauss's 
Method gives this. 

40h.+ 15c = 100 5/4p,+p2 40h+ 15c = 100 
-50h. + 25c= 50 ^ (175/4)c = 175 

So c = 4, and back-substitution gives that h, = 1 . (We will solve the Chemistry 
problem later.) 

1.9 Example The reduction 

x+ y+ z = 9 x+ y+ z= 9 

2x + 4y-3z=1 -2M;P2 2y-5z = -17 

3x + 6y-5z = "^P'+p^ 3y-8z = -27 



-(3/2)p2 + P3 



X+ IJ + Z= 9 

2y- 5z= -17 
-(1/2)z = -(3/2] 



shows that z = 3, y = — 1 , and x = 7. 

As illustrated above, the point of Gauss's Method is to use the elementary 
reduction operations to set up back-substitution. 

1.10 Definition In each row of a system, the first variable with a nonzero coefficient 
is the row's leading variable. A system is in echelon form if each leading 
variable is to the right of the leading variable in the row above it (except for the 
leading variable in the first row). 

1.11 Example The prior three examples only used the operation of row combina- 
tion. This linear system requires the swap operation to get it into echelon form 
because after the first combination 

X— y =0 '>^^y =0 

2x — 2y+ z + 2w=4 -2pi+p2 z + 2w = 4 

y + w = y + w = 

2z+w = 5 2z+w = 5 

the second equation has no leading y. To get one, we put in place a lower-down 
row that has a leading y . 

x-y =0 
P2^3 y + w = 
z + 2w = 4 
2z+ w = 5 
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(Had there been more than one suitable row below the second then we could 
have swapped in any one.) With that, Gauss's Method proceeds as before. 

x-y = 

-2p3+P4 y + w= 

z + 2w = 4 
-3w = -3 

Back-substitution gives w = 1,z = 2,y = — 1, and x = — 1 . 

The row rescaling operation is not needed, strictly speaking, to solve linear 
systems. But we will use it later in this chapter as part of a variation on Gauss's 
Method, the Gauss- Jordan Method. 

All of the systems seen so far have the same number of equations as unknowns. 
All of them have a solution, and for all of them there is only one solution. We 
finish this subsection by seeing some other things that can happen. 

1.12 Example This system has more equations than variables. 

x + 3y= 1 

2x + 2y= -2 

Gauss's Method helps us understand this system also, since this 

x+ 3y= 1 

-2p,+p2 

— > —5y = —5 

-2pi+p3 

-4y = -4 

shows that one of the equations is redundant. Echelon form 

X + 3xj = 1 

>y = -5 
0= 



-(4/5)p2 + P3 _ _ 

— > -5y = -5 



gives that y ~ 1 and x = —2. The '0 = 0' reflects the redundancy. 

Gauss's Method is also useful on systems with more variables than equations. 
Many examples are in the next subsection. 

Another way that linear systems can differ from the examples shown earlier 
is that some linear systems do not have a unique solution. This can happen in 
two ways. 

The first is that a system can fail to have any solution at all. 
1.13 Example Contrast the system in the last example with this one. 

X + 3y = 1 X -I- 3y = 1 

2x -I- y = — 3 — > — 5y = —5 

2x -I- 2y = -4y = -2 
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Here the system is inconsistent: no pair of numbers satisfies all of the equations 
simultaneously. Echelon form makes this inconsistency obvious. 

X + 3y— 1 

-(4/5)p2 + P3 r- r 

— > -5y = -5 

0= 2 

The solution set is empty. 

1.14 Example The prior system has more equations than unknowns, but that 
is not what causes the inconsistency — Example 1.12 has more equations than 
unknowns and yet is consistent. Nor is having more equations than unknowns 
necessary for inconsistency, as we see with this inconsistent system that has the 
same number of equations as unknowns. 

x + 2y=8 -2pi+p2 x + 2y^ 8 
2x + 4y = 8 ^ = -8 

The other way that a linear system can fail to have a unique solution, besides 
having no solutions, is to have many solutions. 

1.15 Example In this system 

x+ y=4 
2x + 2y = 8 

any pair of real numbers (si , sa) satisfying the first equation also satisfies the 
second. The solution set {(x, y) | x + y = 4} is infinite; some of its members 
are (0,4), (-1,5), and (2.5,1.5). 

The result of applying Gauss's Method here contrasts with the prior example 
because we do not get a contradictory equation. 

-2pi+p2 x + 'y=4 
~^ = 



Don't be fooled by the final system in that example. A '0 = 0' equation is 
not the signal that a system has many solutions. 

1.16 Example The absence of a '0 = 0' does not keep a system from having 
many different solutions. This system is in echelon form has no '0 = 0', but has 
infinitely many solutions. 

x + y +z = 
y +z = 

Some solutions are: (0,1,-1), (0,1/2,-1/2), (0,0,0), and (0,-7t,7T). There are 
infinitely many solutions because any triple whose first component is and 
whose second component is the negative of the third is a solution. 

Nor does the presence of a '0 = 0' mean that the system must have many 
solutions. Example 1.12 shows that. So does this system, which does not have 
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2x 


-2z = 


6 




y+ z = 


1 




y+ z = 


1 




3y + 3z = 





2x 


-2z = 


6 






1 




= 







= - 


-3 



any solutions at all despite that in echelon form it has a '0 = 0' row. 

2x - 2z = 6 

y+ z=1 -P1+P3 
2x + y — z — 7 
3y + 3z = 



- P2 + P 3 
-3p2 + P4 



We will finish this subsection with a summary of what we've seen so far 
about Gauss's Method. 

Gauss's Method uses the three row operations to set a system up for back 
substitution. If any step shows a contradictory equation then we can stop with 
the conclusion that the system has no solutions. If we reach echelon form without 
a contradictory equation, and each variable is a leading variable in its row, then 
the system has a unique solution and we find it by back substitution. Finally, 
if we reach echelon form without a contradictory equation, and there is not a 
unique solution — that is, at least one variable is not a leading variable — then 
the system has many solutions. 

The next subsection deals with the third case. We will see that such a system 
must have infinitely many solutions and we will describe the solution set. 

Note For all exercises, you must justify your answer. For instance, if a 
question asks whether a system has a solution then you must justify a 
yes response by producing the solution and must justify a no response by 
showing that no solution exists. 

Exercises 

/ 1.17 Use Gauss's Method to find the unique solution for each system. 

, , 2x + 3y = 13 \ T , 1 
(a) ^_ (b) 3x + y =1 

^ -x + y+ z = 4 

/ 1.18 Use Gauss's Method to solve each system or conclude 'many solutions' or 'no 

solutions'. 



(a) 


2x 


+ 2y 


= 5 


(b) 


-x + y 


= 1 


(c) 


X - 


-3y+ z= 1 


(d) -X- y = l 




X 


-4y 


= 




x + y 


= 2 




xH 


h y+2z = 14 


— 3x — 3y = 2 


(e) 




4y 


+ z = 


20 


(f) 2x 




-f z4- 


w 


= 5 






2x 


-2y 


+ z = 







y 




w 


= -1 






X 




+ z = 


5 


3x 




— z - 


w 


= 






X 


+ y 


— z = 


10 


4x 


+ y 


-f 2z4- 


w 


= 9 





/ 1.19 We can solve linear systems by methods other than Gauss's. One often taught 
in high school is to solve one of the equations for a variable, then substitute the 
resulting expression into other equations. Then we repeat that step until there 
is an equation with only one variable. Prom that we get the first number in the 
solution and then we get the rest with back-substitution. This method takes longer 
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than Gauss's Method, since it involves more arithmetic operations, and is also 
more likely to lead to errors. To illustrate how it can lead to wrong conclusions, 
we will use the system 

X + 3y = 1 
2x+ y=-3 
2x + 2y = 

from Example 1.13. 

(a) Solve the first equation for x and substitute that expression into the second 
equation. Find the resulting y. 

(b) Again solve the first equation for x, but this time substitute that expression 
into the third equation. Find this y. 

What extra step must a user of this method take to avoid erroneously concluding a 
system has a solution? 

/ 1.20 For which values of k are there no solutions, many solutions, or a unique 
solution to this system? 

X- y = 1 
3x — 3y = k 

/ 1.21 This system is not linear, in some sense, 

2 sin a— cosp+3tany= 3 
4 sin a + 2 cos p — 2tany = 10 
6sina — 3cosp+ tany= 9 

and yet we can nonetheless apply Gauss's Method. Do so. Does the system have a 
solution? 

/ 1.22 [Anton] What conditions must the constants, the b's, satisfy so that each of 
these systems has a solution? Hint. Apply Gauss's Method and see what happens 
to the right side. 

(a) x — 3y=b, (b) xi + 2x2 + 3x3 = b, 

3x + y = b2 2xi + 5x2 + 3x3 = b2 

X + 7y = b3 xi + 8x3 = b3 

2x + 4y = b4 

1.23 True or false: a system with more unknowns than equations has at least one 
solution. (As always, to say 'true' you must prove it, while to say 'false' you must 
produce a counterexample.) 

1.24 Must any Chemistry problem like the one that starts this subsection — a balance 
the reaction problem — have infinitely many solutions? 

/ 1.25 Find the coefficients a, b, and c so that the graph of f (x) — ax^ + bx + c passes 
through the points (1,2), (—1,6), and (2,3). 

1.26 After Theorem 1.5 we note that multiplying a row by is not allowed because 
that could change a solution set. Give an example of a system with solution set So 
where after multiplying a row by the new system has a solution set Si and So is 
a proper subset of Sj . Give an example where So = Si . 

1.27 Gauss's Method works by combining the equations in a system to make new 
equations. 

(a) Can we derive the equation 3x — 2y = 5 by a sequence of Gaussian reduction 
steps from the equations in this system? 

x + y = 1 
4x - y = 6 
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(b) Can we derive the equation 5x — 3y =2 with a sequence of Gaussian reduction 
steps from the equations in this system? 

2x + 2y = 5 
3x + y = 4 

(c) Can we derive 6x — 9y + 5z — —2 by a sequence of Gaussian reduction steps 
from the equations in the system? 

2x + y - z = 4 
6x — 3y + z = 5 

1.28 Prove that, where a, b, . . . , e are real numbers and a / 0, if 

ax + by = c 

has the same solution set as 

ax + dy = e 

then they are the same equation. What if a = 0? 
/ 1.29 Show that if ad - be 7^ then 

ax + by = j 
cx + dy = k 

has a unique solution. 
/ 1.30 In the system 

ax + by = c 
dx + ey = f 

each of the equations describes a line in the xy-plane. By geometrical reasoning, 
show that there are three possibilities: there is a unique solution, there is no 
solution, and there are infinitely many solutions. 

1.31 Finish the proof of Theorem 1.5. 

1.32 Is there a two-unknowns linear system whose solution set is all of K^? 

/ 1.33 Are any of the operations used in Gauss's Method redundant? That is, can we 
make any of the operations from a combination of the others? 
1.34 Prove that each operation of Gauss's Method is reversible. That is, show that 
if two systems are related by a row operation Si — > S2 then there is a row operation 
to go back S2 S] . 

? 1.35 [Anton] A box holding pennies, nickels and dimes contains thirteen coins with 

a total value of 83 cents. How many coins of each type are in the box? 
? 1.36 [Con. Prob. 1955] Four positive integers are given. Select any three of the 

integers, find their arithmetic average, and add this result to the fourth integer. 

Thus the numbers 29, 23, 21, and 17 are obtained. One of the original integers 

is: 

(a) 19 (b) 21 (c) 23 (d) 29 (e) 17 
? / 1.37 [Am. Math. Mon., Jan. 1935] Laugh at this: AHAHA + TEHE = TEHAW. It 
resulted from substituting a code letter for each digit of a simple example in 
addition, and it is required to identify the letters and prove the solution unique. 
? 1.38 [Wohascum no. 2] The Wohascum County Board of Commissioners, which has 
20 members, recently had to elect a President. There were three candidates (A, B, 
and C); on each ballot the three candidates were to be listed in order of preference, 
with no abstentions. It was found that 11 members, a majority, preferred A over 
B (thus the other 9 preferred B over A). Similarly, it was found that 12 members 
preferred C over A. Given these results, it was suggested that B should withdraw, 
to enable a runoff election between A and C. However, B protested, and it was 
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then found that 14 members preferred B over C! The Board has not yet recovered 
from the resulting confusion. Given that every possible order of A, B, C appeared 
on at least one ballot, how many members voted for B as their first choice? 
? 1.39 [Am. Math. Mon., Jan. 1963] "This system of n linear equations with n un- 
knowns," said the Great Mathematician, "has a curious property." 
"Good heavens!" said the Poor Nut, "What is it?" 

"Note," said the Great Mathematician, "that the constants are in arithmetic 
progression." 

"It's all so clear when you explain it!" said the Poor Nut. "Do you mean like 
6x + 9y = 12and 15x + 18y =21?" 

"Quite so," said the Great Mathematician, pulling out his bassoon. "Indeed, 
the system has a unique solution. Can you find it?" 

"Good heavens!" cried the Poor Nut, "I am baffled." 

Are you? 



1.2 Describing the Solution Set 

A linear system with a unique solution has a solution set with one element. A 
linear system with no solution has a solution set that is empty. In these cases 
the solution set is easy to describe. Solution sets are a challenge to describe only 
when they contain many elements. 

2.1 Example This system has many solutions because in echelon form 
2x + z — 3 2x+ z— 3 

-(1/2)PI +P2 , ,-. 

X — y — z = l — > — y — (3/2)z = — 1/2 

3 A -(3/2)pi+p3 

3x-y =4 -y - (3/2)z = -1/2 

2x + z= 3 
^n^^ -y-(3/2)z = -1/2 

0= 

not all of the variables are leading variables. Theorem 1.5 shows that an (x,y,z) 
satisfies the first system if and only if it satisfies the third. So we can describe 
the solution set { (x,y , z) | 2x + z = 3 and x — y — z = 1 and 3x — y = 4} in this 
way. 

{(x,y,z) I 2x + z = 3 and — y — 3z/2 = —1 /!} (*) 

This description is better because it has two equations instead of three but it is 
not optimal because it still has some hard to understand interactions among the 
variables. 

To improve it, use the variable that does not lead any equation, z, to describe 
the variables that do lead, x and y . The second equation gives y = (1 /2) — (3/2)z 
and the first equation gives x — (3/2) — (1 /2)z. Thus we can describe the solution 
set in this way. 

{(x,y,z) = ((3/2]-(V2)z,(l/2]-(3/2)z,z) IzGM} (**) 
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Compared with (*), the advantage of {**) is that z can be any real number. 
This makes the job of deciding which tuples are in the solution set much easier. 
For instance, taking z — 2 shows that (1 /2, —5/2, 2) is a solution. 

2.2 Definition In an echelon form linear system the variables that are not leading 
are free. 

2.3 Example Reduction of a linear system can end with more than one variable 
free. On this system Gauss's Method 

x+ y+ z— w= 1 x+ y+ z— w = 1 

•y — z+ w = — 1 -3pi+P3 y— z+ w = — 1 

3x + 6z — 6w = 6 — 3^ + 3z — 3w = 3 

— y + z — w = 1 — y + z — w — 1 

x + y + z — w — 1 
3P2+P3 -y-z + w^-l 

P2 + P4 0= 

0= 

leaves x and y leading, and both z and w free. To get the description that we 
prefer we work from the bottom. We first express the leading variable y in terms 
of z and w, with y — —] + z — w. Moving up to the top equation, substituting 
for y gives x+(— 1+z — w)+z — w = l and solving for x leaves x = 2 — 2z + 2w. 
The solution set 

{ (2 — 2z + 2w, —1 + z — w, z, w) I z, w e M} (**) 
has the leading variables in terms of the free variables. 

2.4 Example The list of leading variables may skip over some columns. After 
this reduction 

2x-2y =0 2x-2y =0 

z + 3w = 2 -(3/2)pi+p3 z + 3w = 2 

3x — 3y =0 -(i/2)pi+P4 = 

X - y + 2z + 6w = 4 2z + 6w = 4 

2x - 2y =0 

-2p2+P4 z + 3w = 2 

~^ = 

= 

X and z are the leading variables, not x and y. The free variables are y and w 
and so we can describe the solution set as {(y,y,2 — 3w, w) | y,w S E}. For 
instance, (1, 1,2, 0) satisfies the system — take y = 1 and w = 0. The four-tuple 
(1,0,5,4) is not a solution since its first coordinate does not equal its second. 

A variable that we use to describe a family of solutions is a parameter. We 
say that the solution set in the prior example is parametrized with y and w. 
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(The terms 'parameter' and 'free variable' do not mean the same thing. In the 
prior example y and w are free because in the echelon form system they do not 
lead while they are parameters because of how we used them to describe the set 
of solutions. Had we instead rewritten the second equation as w = 2/3 — (1/3)z 
then the free variables would still be y and w but the parameters would be y 
and z.) 

In the rest of this book we will solve linear systems by bringing them to 
echelon form and then using the free variables as parameters in the description 
of the solution set. 

2.5 Example This is another system with infinitely many solutions. 

x + 2y =1 x+ 2y =1 

2x +z =2 ~4y+z =0 

— 3pi +P3 

3x + 2'L|+z — w = 4 —4y + z — w = 1 

x+ 2y =1 

— > — 4y + z =0 

— w — 1 

The leading variables are x, y, and w. The variable z is free. Notice that, 
although there are infinitely many solutions, the value of w doesn't vary but is 
constant at —1 . To parametrize, write w in terms of z with w = — 1 + Oz. Then 
y = (1/4)z. Substitute for y in the first equation to get x = 1 — (1/2)z. The 
solution set is {(1 - (1/2)z, (1/4)z,z,-1) | z G M}. 

Parametrizing solution sets shows that systems with free variables have 
infinitely many solutions. In the prior example, z takes on all real number values, 
each associated with an element of the solution set, and so there are infinitely 
many such elements. 

We finish this subsection by developing a streamlined notation for linear 
systems and their solution sets. 

2.6 Definition An mxn matrix is a rectangular array of numbers with m rows 
and n columns. Each number in the matrix is an entry. 

Matrices are usually named by upper case roman letters such as A. For 
instance, 

" -) 

has 2 rows and 3 columns and so is a 2 x 3 matrix. Read that aloud as "two- 
by-three"; the number of rows is always first. We denote entries with the 
corresponding lower-case letter so that q^j is the number in row i and column j 
of the array. The entry in the second row and first column is Q2,i =3. Note 
that the order of the subscripts matters: a^^2 ^ 12,1 since ai_2 = 2.2. (The 
parentheses around the array are so that when two matrices are adjacent then 
we can tell where one ends and the next one begins.) 
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Matrices occur throughout this book. We shall use 
collection of nxm matrices. 

2.7 Example We can abbreviate this linear system 



to denote the 



with this matrix. 



X4 


-2y 




= 4 




y 


— z 


= 


X 




+ 2z 


= 4 




2 







(; 


1 


-1 









2 





The vertical bar just reminds a reader of the difference between the coefficients 
on the system's left hand side and the constants on the right. With a bar, this 
is an augmented matrix. In this notation the clerical load of Gauss's Method — 
the copying of variables, the writing of +'s and ='s — is lighter. 





2 











2 









h 


2 





4 





1 


-1 


i) 


-Pl +P3 





1 


-1 


i) 


2P2+P3 





1 


-1 




V 





2 








-2 


2 

















The second row stands for y — z — and the first row stands for x + 2y = 4 so 
the solution set is { (4 — 2z, z, z) | z G M}. 

We will also use the matrix notation to clarify the descriptions of solution sets. 
The description { (2 — 2z + 2w, —1 + z — w, z, w) | z, w e M} from Example 2.3 
is hard to read. We will rewrite it to group all the constants together, all the 
coefficients of z together, and all the coefficients of w together. We will write 
them vertically, in one-column matrices. 



z,w e M} 



For instance, the top line says that x = 2 — 2z + 2w and the second line says 
that y = — 1 + z — w. The next section gives a geometric interpretation that will 
help us picture the solution sets. 



( 








( '\ 


-1 




1 


•z + 


-1 





+ 


1 













I V 



2.8 Definition A vector (or column vector) is a matrix with a single column. 
A matrix with a single row is a row vector. The entries of a vector are its 
components. A column or row vector whose components are all zeros is a zero 
vector. 



Vectors are an exception to the convention of representing matrices with 
capital roman letters. We use lower-case roman or greek letters overlined with an 



Section I. Solving Linear Systems 



15 



arrow: a, b, . . . or a, (3, . . . (boldface is also common: a or a). For instance, 
this is a column vector with a third component of 7. 



A zero vector is denoted 0. There are many different zero vectors, e.g., the 
one-tall zero vector, the two-tall zero vector, etc. Nonetheless we will usually 
say "the" zero vector, expecting that the size will be clear from the context. 



2.9 Definition The linear equation ai xi + axxz 
xi , . . . , x^x is satisfied by 



d with unknowns 



if Qi si + a2S2 + • • • + a^Sn = d. A vector satisfies a linear system if it satisfies 
each equation in the system. 

The style of description of solution sets that we use involves adding the 
vectors, and also multiplying them by real numbers, such as the z and w. We 
need to define these operations. 

2.10 Definition The vector sum of u and v is the vector of the sums. 



u + V = 



'Ul 



Ul +Vi 



Note that the vectors must have the same number of entries for the addition 
to be defined. This entry-by-entry addition works for any pair of matrices, not 
just vectors, provided that they have the same number of rows and columns. 

2.11 Definition The scalar multiplication of the real number r and the vector v 
is the vector of the multiples. 



r • V = r • 



n 













As with the addition operation, the entry-by-entry scalar multiplication 
operation extends beyond just vectors to any matrix. 

We write scalar multiplication in either order, as r • v or v • r, or without the 
'•' symbol: rv. (Do not refer to scalar multiplication as 'scalar product' because 
we use that name for a different operation.) 
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2.12 Example 







/ 


3 














\ 








1\ 




A 




4 




28 




-1 




-7 


V- 


-V 


\- 


-21/ 



7- 



Notice that the definitions of vector addition and scalar multiplication agree 
where they overlap, for instance, v + v = 2v. 

With the notation defined, we can now solve systems in the way that we will 
use from now on. 



2.13 Example This system 



reduces in this way. 

^2 



1 





-1 





4 


1 





1 


1 







-1 


2 








2x + y — w =4 
y + w+u=4 
X — z + 2w = 



-(1/2)P1+P3 



(1/2)P2 + P3 




The solution set is {(w + (1/2)u,4 — w — u, 3w + (1/2)u, w,u) | w,u e 
write that in vector form. 



}. We 





(o\ 


y 


4 


z 


= 


w 





[uj 





/ 1\ 

-1 

3 
1 



w - 



/V2\ 
-1 
1/2 



V V 



u I w,u e K} 



Note how well vector notation sets off the coefficients of each parameter. For 
instance, the third row of the vector form shows plainly that if u is fixed then z 
increases three times as fast as w. Another thing shown plainly is that setting 
both w and u to zero gives that this vector 





fo\ 


y 


4 


z = 


= 


w 







voy 



is a particular solution of the linear system. 
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2.14 Example In the same way, this system 

V 



X 

3x 
5x 



+ z = 3 
2y + 3z = 5 



reduces 





-1 


1 








-1 


1 


(i 





1 




-3pi +P2 





3 


-2 




-2 


3 




-5pi +P3 




3 


-2 











-P2+^P3 1 







-1 11 

3-2 
0; 



to a one-parameter solution set 

/l 




As in the prior example, the vector not associated with the parameter 



is a particular solution of the system. 

Before the exercises, we will consider what we have accomplished and what 
we have yet to do. 

So far we have done the mechanics of Gauss's Method. Except for one result. 
Theorem 1.5 — which we did because it says that the method gives the right 
answers — we have not stopped to consider any of the interesting questions that 
arise. 

For example, can we prove that we can always describe solution sets as above, 
with a particular solution vector added to an unrestricted linear combination 
of some other vectors? We've noted that the solution sets we described in this 
way have infinitely many solutions so an answer to this question would tell us 
about the size of solution sets. It will also help us understand the geometry of 
the solution sets. 

Many questions arise from our observation that we can do Gauss's Method 
in more than one way (for instance, when swapping rows we may have a choice 
of more than one row). Theorem 1.5 says that we must get the same solution 
set no matter how we proceed but if we do Gauss's Method in two ways must 
we get the same number of free variables in each echelon form system? Must 
those be the same variables, that is, is solving a problem one way to get y and 
w free and solving it another way to get y and z free impossible? 

In the rest of this chapter we will answer these questions. The answer to 
each is 'yes'. We do the first one, the proof about the description of solution sets, 
in the next subsection. Then, in the chapter's second section, we will describe 
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the geometry of solution sets. After that, in this chapter's final section, we will 
settle the questions about the parameters. When we are done we will not only 
have a solid grounding in the practice of Gauss's Method, we will also have 
a solid grounding in the theory. We will know exactly what can and cannot 
happen in a reduction. 

Exercises 

/ 2.15 Find the indicated entry of the matrix, if it is defined. 

'1 3 V 



A 



2-14 



(a) a2,i (b) ai,2 (c) 02,2 (d) Q3,i 
/ 2.16 Give the size of each matrix. 

(i : 9 - J) - (.0 't 

/ 2.17 Do the indicated vector operation, if it is defined. 

(a) h ) + f 1 (b) 5 ( ^) (c) f 5 1 - h 1 (d) 7 




(e) 

/ 2.18 Solve each system using matrix notation. Express the solution using vec- 
tors. 



(a) 3x + 6y 


= 18 


(b) x + y = 1 


(c) Xi 


+ X3 




4 




x + 2y 


= 6 


x-y =-1 




- X2 + 2X3 




5 










4x, 


- X2 + 5X3 
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(d) 2Q + b- 


-c = 2 


(e) X + 2y - z 


= 3 


(f) X 




+ z- 


-f w =4 


2q H 


h c = 3 


2x+ y 


f w = 4 


2x + 


y 




— w = 2 


a-b 


= 


X- y +z- 


fw = 1 


3x + 


y 


+ z 


= 7 



/ 2.19 Solve each system using matrix notation. Give each solution set in vector 
notation. 

(a) 2x + y - z = 1 (b) x - z =1 (c) x - y + z =0 

4x — y =3 y+2z — w = 3 y +w = 

x + 2y + 3z — w = 7 3x — 2y+3z + w = 



-y —w — 



(d) Q + 2b + 3c + d - e = 1 

3q- b+ c + d + e = 3 



/ 2.20 The vector is in the set. What value of the parameters produces that vec- 
tor? 




m,n e R} 
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2.21 Decide if the vector is in the set. 




r e 



j,ke R} 

2.22 [Cleary] A farmer with 1200 acres is considering planting three different crops, 
corn, soybeans, and oats. The farmer wants to use all 1200 acres. Seed corn costs 
$20 per acre, while soybean and oat seed cost $50 and $12 per acre respectively. 
The farmer has $40 000 available to buy seed and intends to spend it all. 

(a) Use the information above to formulate two linear equations with three 
unknowns and solve it. 

(b) Solutions to the system are choices that the farmer can make. Write down 
two reasonable solutions. 

(c) Suppose that in the fall when the crops mature, the farmer can bring in 
revenue of $100 per acre for corn, $300 per acre for soybeans and $80 per acre 
for oats. Which of your two solutions in the prior part would have resulted in a 
larger revenue? 

2.23 Parametrize the solution set of this one-equation system. 

"1 + X2 H h Xn = 

/ 2.24 (a) Apply Gauss's Method to the left-hand side to solve 

X + 2y — w = a 
2x + z = b 
X + y + 2w = c 

for X, y, z, and w, in terms of the constants a, b, and c. 
(b) Use your answer from the prior part to solve this. 

X + 2y — w = 3 
2x +z =1 
X + y + 2w = —2 

/ 2.25 Why is the comma needed in the notation 'Qij' for matrix entries? 

/ 2.26 Give the 4x4 matrix whose i, j-th entry is 
(a) i + j; (b) —1 to the i + j power. 

2.27 For any matrix A, the transpose of A, written A^, is the matrix whose columns 
are the rows of A. Find the transpose of each of these. 



4 5 67 ' ' VI 1/ VIO 5 

/ 2.28 (a) Describe all functions f (x) — ax^ +bx + c such that f (1 ] =2 and f (— 1 ) = 6. 
(b) Describe all functions f (x) = qx^ + bx + c such that f (1 ) 
2.29 Show that any set of five points from the plane lie on a common conic section, 
that is, they all satisfy some equation of the form ax^ + by ^ + cxy + dx + ey + f = 
where some of a, ... , f are nonzero. 
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2.30 Make up a four equations/four unknowns system having 

(a) a one-parameter solution set; 

(b) a two-parameter solution set; 

(c) a three-parameter solution set. 

? 2.31 [Shepelev] This puzzle is from a Russian web-site http://www.arbuz.uz/ and 
there are many solutions to it, but mine uses linear algebra and is very naive. 
There's a planet inhabited by arbuzoids (watermeloners, to translate from Russian). 
Those creatures are found in three colors: red, green and blue. There are 13 red 
arbuzoids, 15 blue ones, and 17 green. When two differently colored arbuzoids 
meet, they both change to the third color. 

The question is, can it ever happen that all of them assume the same color? 

? 2.32 [USSR Olympiad no. 174] 

(a) Solve the system of equations. 

QX + y = 
X + ay = 1 

For what values of a does the system fail to have solutions, and for what values 
of a are there infinitely many solutions? 

(b) Answer the above question for the system. 

ax + y = 
X + ay = 1 

? 2.33 [Math. Mag., Sept. 1952] In air a gold-surfaced sphere weighs 7588 grams. It 
is known that it may contain one or more of the metals aluminum, copper, silver, 
or lead. When weighed successively under standard conditions in water, benzene, 
alcohol, and glycerin its respective weights are 6588, 6688, 6778, and 6328 grams. 
How much, if any, of the forenamed metals does it contain if the specific gravities 
of the designated substances are taken to be as follows? 

Aluminum 2.7 Alcohol 0.81 

Copper 8.9 Benzene 0.90 

Gold 19.3 Glycerin 1.26 

Lead 11.3 Water 1.00 

Silver 10.8 



1.3 General = Particular + Homogeneous 

In the prior subsection the descriptions of solution sets all fit a pattern. They 
have a vector that is a particular solution of the system added to an unre- 
stricted combination of some other vectors. The solution set from Example 2.13 
illustrates. 
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The combination is unrestricted in that w and u can be any real numbers — 
there is no condition like "such that 2w — u = 0" that would restrict which pairs 
w,u we can use. 

That example shows an infinite solution set fitting the pattern. The other 
two kinds of solution sets also fit. A one-element solution set fits because it has 
a particular solution and the unrestricted combination part is trivial. (That is, 
instead of being a combination of two vectors or of one vector, it is a combination 
of no vectors. We are using the convention that the sum of an empty set of 
vectors is the vector of all zeros.) A zero-element solution set fits the pattern 
because there is no particular solution and so there are no sums of that form. 

3.1 Theorem Any linear system's solution set has the form 

{p + ci (3i + • • • + Ck(3k I ci , . . . , Ck e M} 

where p is any particular solution and where the number of vectors (3i , . . . , 
pk equals the number of free variables that the system has after a Gaussian 
reduction. 

The solution description has two parts, the particular solution p and the 
unrestricted linear combination of the (3's. We shall prove the theorem in two 
corresponding parts, with two lemmas. 

We will focus first on the unrestricted combination. For that we consider 
systems that have the vector of zeroes as a particular solution so that we can 
shorten p + Ci (3i H + Ck(3k to Ci Pi H + CkPk- 

3.2 Definition A linear equation is homogeneous if it has a constant of zero, so 
that it can be written as ai xi + a2X2 + • • • + a^x^ — 0. 

3.3 Example With any linear system like 

3x + 4y = 3 
2x- y = l 

we associate a system of homogeneous equations by setting the right side to 
zeros. 

3x + 4y = 
2x- y = 

Our interest in this comes from comparing the reduction of the original system 

3x + 4'y = 3 -(2/3)pi+P2 3x+ 4ij = 3 

2x- y = l _(n/3)y^_i 

with the reduction of the associated homogeneous system. 



3x + 4'y = -(2/3)p,+P2 3x+ 4^ = 

2x- y = ^ -(n/3)y = 
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Obviously the two reductions go in the same way. We can study how to reduce 
a linear systems by instead studying how to reduce the associated homogeneous 
system. 

Studying the associated homogeneous system has a great advantage over 
studying the original system. Nonhomogeneous systems can be inconsistent. 
But a homogeneous system must be consistent since there is always at least one 
solution, the zero vector. 

3.4 Example Some homogeneous systems have the zero vector as their only 
solution. 



3x + + z : 
6x + 4y 



3x + 2ij + z = 

-2pi +P2 2z — ^^-^"^ 



V 



-0 



3x + 2y+ z = 
-2z = 



3.5 Example Some homogeneous systems have many solutions. One example is 
the Chemistry problem from the first page of this book. 



7x - 7z =0 
8x + y - 5z - 2w = 
y - 3z =0 
3y — 6z — w = 



-(8/7)pi+p2 



-P2 + P3 
-3p2+P4 



-(5/2)p3+P4 



7x - 7z =0 
y + 3z - 2w = 
y - 3z =0 
3y — 6z — w = 

7x - 7z =0 
y + 3z - 2w = 
-6z + 2w = 
-15z + 5w = 



7x 



7z =0 
3z - 2w = 
-6z + 2w = 
= 



The solution set 



/V3\ 
1 

1/3 

V V 



w w e M} 



has many vectors besides the zero vector (if we interpret w as a number of 
molecules then solutions make sense only when w is a nonnegative multiple of 
3). 



3.6 Lemma For any homogeneous linear system there exist vectors , 
such that the solution set of the system is 



|3k 



{ciPi H hCkPk I ci, 



where k is the number of free variables in an echelon form version of the system. 
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We will make two points before the proof. The first point is that the basic 
idea of the proof is straightforward. Consider a system of homogeneous equations 
in echelon form. 

x + y + 2z+s + t = 
y + z+s-t=0 
s + t = 

Start at the bottom, expressing its leading variable in terms of the free variables 
with s = — t. For the next row up, substitute the expression giving s as a 
combination of free variables y + z + {— t) — t = and solve for its leading 
variable y = — z + 2t. Iterate: on the next row up, substitute expressions derived 
from prior rows x + (— z + 2t) + 2z + (— t) + t = and solve for the leading 
variable x — — z — 2t. Now to finish, write the solution in vector notation 



y 

z 
s 

VJ 



-1 
1 



\0J 



2 


-1 



z,t e 



and recognize that the (3i and pi of the lemma are the vectors associated with 
the free variables z and t. 

The prior paragraph is a sketch, not a proof; for instance, a proof would have 
to hold no matter how many equations are in the system. 

The second point we will make about the proof concerns its style. The 
above sketch moves row-by-row up the system, using the equations derived for 
the earlier rows to do the next row. This suggests a proof by mathematical 
induction.* Induction is an important and non-obvious proof technique that we 
shall use a number of times in this book. 

We prove a statement by mathematical induction using two steps, a base 
step and an inductive step. In the base step we establish that the statement is 
true for some first instance, here that for the bottom equation we can write the 
leading variable in terms of the free variables. In the inductive step we must 
verify an implication, that if the statement is true for all prior cases then it 
follows for the present case also. Here we will argue that if we can express the 
leading variables from the bottom-most t rows in terms of the free variables then 
we can express the leading variable of the next row up — the t + 1-th row from 
the bottom — in terms of the free variables. Those two steps together prove the 
statement because by the base step it is true for the bottom equation, and by 
the inductive step the fact that it is true for the bottom equation shows that 
it is true for the next one up. Then another application of the inductive step 
implies that it is true for the third equation up, etc. 

Proof Apply Gauss's Method to get to echelon form. We may get some 
= equations (if the entire system consists of such equations then the result 



* More information on mathematical induction is in the appendix. 
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is trivially true) but because the system is homogeneous we cannot get any 
contradictory equations. We will use induction to show this statement: each 
leading variable can be expressed in terms of free variables. That will finish the 
proof because we can then use the free variables as the parameters and the (3's 
are the vectors of coefficients of those free variables. 

For the base step, consider the bottommost equation that is not = 0. Call 
it equation m so we have 

Qm,f„^{™ + am,«,^ + lX{,^ + i H h am,nXn = 

where a^^j^^ ^ 0. (The T means "leading" so that Xf^ is the leading variable 
in row m.) This is the bottom row so any variables Xf^+i , ... after the leading 
variable in this equation must be free variables. Move these to the right side 
and divide by Qm.fm 

''fn, = (-1m,{,^ + l/am,£,^)X{,^ + l H h (-am,n/am,{,^)XTi 

to express the leading variable in terms of free variables. (If there are no variables 
to the right of Xi^^ then x^^^ = 0; see the "tricky point" following this proof.) 

For the inductive step assume that for the m-th equation, and the (m — 1 )-th 
equation, etc., up to and including the (m — t)-th equation (where ^ t < m), 
we can express the leading variable in terms of free variables. We must verify that 
this statement also holds for the next equation up, the (m— (t+ 1 ))-th equation. 
As in the earlier sketch, take each variable that leads in a lower-down equation 
X£^^ , . . . , ^ and substitute its expression in terms of free variables. (We only 
need do this for the leading variables from lower-down equations because the 
system is in echelon form and so in this equation none of the variables leading 
higher up equations appear.) The result has the form 

im-(t+i ),fm_it+i )''«ni_(t+, ) + linear combination of free variables = 

with Qm-(t+i , 7^ 0- Move the free variables to the right side and divide 

by ci^_(t+i t+, ) to end with x^^^ ^^^, ^ expressed in terms of free variables. 

Because we have shown both the base step and the inductive step, by the 
principle of mathematical induction the proposition is true. QED 

We say that the set {ci (3i + • • • + Cicpk | Ci , . . . , Ck G K} is generated by or 
spanned by the set of vectors { (3i , . . . , pk}. 

There is a tricky point to this. We rely on the convention that the sum of an 
empty set of vectors is the zero vector. In particular, we need this in the case 
where a homogeneous system has a unique solution. Then the homogeneous 
case fits the pattern of the other solution sets: in the proof above, we derive the 
solution set by taking the c's to be the free variables and if there is a unique 
solution then there are no free variables. 

Note that the proof shows, as discussed after Example 2.4, that we can always 
parametrize solution sets using the free variables. 

The next lemma finishes the proof of Theorem 3.1 by considering the partic- 
ular solution part of the solution set's description. 
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3.7 Lemma For a linear system, where p is any particular solution, the solution 
set equals this set. 

{p + h I h satisfies the associated homogeneous system} 

Proof We will show mutual set inclusion, that any solution to the system is in 
the above set and that anything in the set is a solution to the system.* 

For set inclusion the first way, that if a vector solves the system then it is in 
the set described above, assume that s solves the system. Then s — p solves the 
associated homogeneous system since for each equation index i, 

ai,l (Sl -pi ) H h av,n(Sn -Pn) 

= (ai,iSi H h at^nSn) - (QijPl H h Qi,nPn) = di - di = 

where pj and Sj are the j-th components of p and s. Express s in the required 
p + h form by writing s — p as h. 

For set inclusion the other way, take a vector of the form p + h, where p 
solves the system and h solves the associated homogeneous system and note 
that p + h solves the given system: for any equation index i, 

ai,l (Pl + Hi ) H h Qi,n(Pn + Hn] 

= (atjpi H h ai,nPn) + (aijhi H h ai,nHn) = di + = di 

where pj and hj are the j-th components of p and h. QED 

The two lemmas above together establish Theorem 3.1. We remember that 
theorem with the slogan, "General = Particular + Homogeneous". 

3.8 Example This system illustrates Theorem 3.1. 

X + 2ij — z—] 
2x + 4y =2 
y - 3z = 

Gauss's Method 

x + 2i|— z=1 x + 2'y — z=1 

-2pi+P2 - . P2-H-P3 -, n 

— 2z = — > -y — 3z = 

y - 3z = 2z = 

shows that the general solution is a singleton set. 

{ } 

w 



* More information on equality of sets is in the appendix. 
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That single vector is obviously a particular solution. The associated homogeneous 
system reduces via the same row operations 



x + 2y- z = 
2x + 4y =0 
ij - 3z = 

to also give a singleton set. 



-2pi+P2 Pi-^ 



2y- z = 
y - 3z = 
2z = 




So, as discussed at the start of this subsection, in this single-solution case the 
general solution results from taking the particular solution and adding to it the 
unique solution of the associated homogeneous system. 

3.9 Example Also discussed at the start of this subsection is that the case 
where the general solution set is empty also fits the 'General — Particular + 
Homogeneous' pattern. This system illustrates. Gauss's Method 



X 

2x-y 
x + y 



3z- 



w = — 1 
w — 3 
2w= 1 



-2 pi + P2 

-Pr +P3 



-V 
V 



- z + w = — 1 
2z — w— 5 

- 2z + w = 2 



shows that it has no solutions because the final two equations are in conflict. 
The associated homogeneous system has a solution, because all homogeneous 
systems have at least one solution. 



X 

2x-y 
x + y 



z+ w = 
+ w = 
3z + 2w = 



-2 p| + P2 P 2 + P 3 
-Pl +P3 



- Z + W = 

■ 2z - w = 
= 



In fact the solution set of this homogeneous system is infinite. 





(-'\ 




-1 


z + 





/ 





w I z, w e M} 



However, because no particular solution of the original system exists, the general 
solution set is empty — there are no vectors of the form p + h because there are 
no p 's. 

3.10 Corollary Solution sets of linear systems are either empty, have one element, 
or have infinitely many elements. 

Proof We've seen examples of all three happening so we need only prove that 
there are no other possibilities. 

First, notice a homogeneous system with at least one non-0 solution v has 
infinitely many solutions. This is because the set of multiples of v is infinite — if 
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s, t e M are unequal then sv ^ tv because sv — tv = (s — t)v is non-0, since any 
non-0 component of v when rescaled by the non-0 factor s — t will give a non-0 
value. 

Now apply Lemma 3.7 to conclude that a solution set 

{p + h I h solves the associated homogeneous system} 

is either empty (if there is no particular solution p), or has one element (if there 
is a p and the homogeneous system has the unique solution 0), or is infinite (if 
there is a p and the homogeneous system has a non-0 solution, and thus by the 
prior paragraph has infinitely many solutions). QED 

This table summarizes the factors affecting the size of a general solution. 

number of solutions of the 
homogeneous system 

one infinitely many 



particular 
solution 
exists ? no 



unique infinitely many 
solution solutions 

no no 
solutions solutions 



The dimension on the top of the table is the simpler one. When we perform 
Gauss's Method on a linear system, ignoring the constants on the right side and 
so paying attention only to the coefiicients on the left-hand side, we either end 
with every variable leading some row or else we find that some variable does not 
lead a row, that is, we find that some variable is free. (We formalize "ignoring 
the constants on the right" by considering the associated homogeneous system.) 

A notable special case is systems having the same number of equations as 
unknowns. Such a system will have a solution, and that solution will be unique, 
if and only if it reduces to an echelon form system where every variable leads its 
row (since there are the same number of variables as rows), which will happen if 
and only if the associated homogeneous system has a unique solution. 

3.11 Definition A square matrix is nonsingular if it is the matrix of coefficients 
of a homogeneous system with a unique solution. It is singular otherwise, that 
is, if it is the matrix of coefficients of a homogeneous system with infinitely many 
solutions. 

3.12 Example The first of these matrices is nonsingular while the second is 
singular 

(3 4) (3 e) 

because the first of these homogeneous systems has a unique solution while the 
second has infinitely many solutions. 

x + 2y=0 x + 2ij=0 
3x + 4y = 3x + 6ij = 
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We have made the distinction in the definition because a system with the same 
number of equations as variables behaves in one of two ways, depending on 
whether its matrix of coefficients is nonsingular or singular. A system where the 
matrix of coefficients is nonsingular has a unique solution for any constants on 
the right side: for instance, Gauss's Method shows that this system 

x + 2y — a 
3x + 4y = b 

has the unique solution x = b — 2a and y = (3a — b)/2. On the other hand, a 
system where the matrix of coefficients is singular never has a unique solution — 
it has either no solutions or else has infinitely many, as with these. 

x + 2y — ] X + 2y = 1 
3x + 6ij = 2 3x + 6y = 3 

We use the word singular because it means "departing from general expecta- 
tion" and people often, naively, expect that systems with the same number of 
variables as equations will have a unique solution. Thus, we can think of the 
word as connoting "troublesome," or at least "not ideal." (That 'singular' applies 
to those systems that do not have one solution is ironic, but it is the standard 
term.) 

3.13 Example The systems from Example 3.3, Example 3.4, and Example 3.8 
each have an associated homogeneous system with a unique solution. Thus these 
matrices are nonsingular. 






The Chemistry problem from Example 3.5 is a homogeneous system with more 
than one solution so its matrix is singular. 

[7 -7 0\ 
8 1-5-2 
1-3 

\o 3 -6 -iy 

The above table has two dimensions. We have considered the one on top: we 
can tell into which column a given linear system goes solely by considering the 
system's left-hand side — the constants on the right-hand side play no role in 
this factor. 

The table's other dimension, determining whether a particular solution exists, 
is tougher. Consider these two 



3x + 2ij = 5 3x + 2^ = 5 
3x + 2ij = 5 3x + 2y = 4 
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with the same left sides but different right sides. The first has a solution while the 
second does not, so here the constants on the right side decide if the system has 
a solution. We could conjecture that the left side of a linear system determines 
the number of solutions while the right side determines if solutions exist but 
that guess is not correct. Compare these two systems 

3x + 2y = 5 3x + 2y —5 
4x + 2y = 4 3x + 2t) = 4 

with the same right sides but different left sides. The first has a solution but the 
second does not. Thus the constants on the right side of a system don't decide 
alone whether a solution exists; rather, it depends on some interaction between 
the left and right sides. 

For some intuition about that interaction, consider this system with one of 
the coefficients left as the parameter c. 

X + 2ij + 3z = 1 
X + -y + z = 1 
cx + Sy + 4z = 

If c — 2 then this system has no solution because the left-hand side has the 
third row as a sum of the first two, while the right-hand does not. If c 7^ 2 
then this system has a unique solution (try it with c = 1). For a system to 
have a solution, if one row of the matrix of coefficients on the left is a linear 
combination of other rows, then on the right the constant from that row must 
be the same combination of constants from the same rows. 

More intuition about the interaction comes from studying linear combinations. 
That will be our focus in the second chapter, after we finish the study of Gauss's 
Method itself in the rest of this chapter. 

Exercises 

/ 3.14 Solve each system. Express the solution set using vectors. Identify the particular 
solution and the solution set of the homogeneous system. (These systems also 
appear in Exercise 18.) 

(a) 3x + 6y = 18 (b) x + y= 1 (c) x, + X3 = 4 

x + 2y — G X — y = — 1 X] — X2 + 2x3 — 5 

4x] — X2 + 5x3 ~ 1 7 
(d) 2q + b - c = 2 (e) x + 2y - z =3 (f) x + z + w = 4 
2q +0 = 3 2x+y +w — 4 2x + y — w = 2 

a — b —Q X— y+z + w = l 3x + y+ z =7 

3.15 Solve each system, giving the solution set in vector notation. Identify the 
particular solution and the solution of the homogeneous system. 

(a) 2x + y - z = 1 (b) x - z =1 (c) x - y + z =0 
4x — y =3 y+2z — w = 3 y +w — 

x + 2y + 3z — w = 7 3x — 2y+3z + w = 



-y —w = Q 



(d) Q + 2b + 3c + d - e = 1 

3q- b+ c + d + e = 3 
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/ 3.16 For the system 



2x 



V - w: 
y + z + 2w : 
2y - z 



which of these can be used as the particular solution part of some general solu- 
tion? 





f °) 








(''\ 


(a) 


-3 


(b) 


1 




-4 


5 


1 


(c) 


8 




^ oj 




\oJ 







/ 3.17 Lemma 3.7 says that we can use any particular solution for p. Find, if possible, 
a general solution to this system 

X — y + w = 4 
2x + 3y - z =0 
y + z + w = 4 

that uses the given vector as its particular solution. 
1 

-7 

Vio/ 



(a) 





V4/ 



(b) 



(c) 



-1 
1 

V 1/ 



3.18 One is nonsingular while the other is singular. Which is which? 



-12 



4 12 



/ 3.19 Singular or nonsingular? 

(a) (] I) (b) ^ ^ ' 



(d) 



-3 -6 



(c) 



(Careful!) 







2 




j (e) 


(J 





i) 






1 





/ 3.20 Is the given vector in the set generated by the given set? 




3.21 Prove that any linear system with a nonsingular matrix of coefficients has a 
solution, and that the solution is unique. 

3.22 In the proof of Lemma 3.6, what happens if there are no non-'O — 0' equations? 

/ 3.23 Prove that if s and t satisfy a homogeneous system then so do these vec- 
tors. 



Section I. Solving Linear Systems 



31 



(a) s + t (b) 3s (c) ks + mt for k, me R 
What's wrong with this argument: "These three show that if a homogeneous system 
has one solution then it has many solutions — any multiple of a solution is another 
solution, and any sum of solutions is a solution also — so there are no homogeneous 
systems with exactly one solution."? 
3.24 Prove that if a system with only rational coefficients and constants has a 
solution then it has at least one all-rational solution. Must it have infinitely many? 
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II Linear Geometry 

// you have seen the elements of vectors before then this section is an 
optional review. However, later work will refer to this material so if this is 
not a review then it is not optional. 

In the first section, we had to do a bit of work to show that there are only- 
three types of solution sets — singleton, empty, and infinite. But in the special 
case of systems with two equations and two unknowns this is easy to see with a 
picture. Draw each two-unknowns equation as a line in the plane and then the 
two lines could have a unique intersection, be parallel, or be the same line. 

Unique solution No solutions Infinitely many 




3x + 2y= 7 3x + 2x)=7 3x + 2y= 7 

X- y = -] 3x + 2y=4 6x + 4y = 14 



These pictures aren't a short way to prove the results from the prior section, 
because those apply to any number of linear equations and any number of 
unknowns. But they do help us understand those results. This section develops 
the ideas that we need to express our results geometrically. In particular, while 
the two-dimensional case is familiar, to extend to systems with more thein two 
unknowns we shall need some higher-dimensional geometry. 



II. 1 Vectors in Space 

"Higher-dimensional geometry" sounds exotic. It is exotic — interesting and 
eye-opening. But it isn't distant or unreachable. 

We begin by defining one-dimensional space to be To see that the 
definition is reasonable, we picture a one-dimensional space 



and make a correspondence with K by picking a point to label and another to 
label 1. 

1 

Now, with a scale and a direction, finding the point corresponding to, say, -1-2.17, 
is easy — start at and head in the direction of 1, but don't stop there, go 2.17 
times as far. 

The basic idea here, combining magnitude with direction, is the key to 
extending to higher dimensions. 
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An object comprised of a magnitude and a direction is a vector (we use the 
same word as in the prior section because we shall show below how to describe 
such an object with a column vector). We can draw a vector as having some 
length, and pointing somewhere. 




There is a subtlety here — these vectors 



are equal, even though they start in different places, because they have equal 
lengths and equal directions. Again: those vectors are not just alike, they are 
equal. 

How can things that are in different places be equal? Think of a vector as 
representing a displacement (the word vector is Latin for "carrier" or "traveler"). 
These two squares undergo equal displacements, despite that those displacements 
start in different places. 



H- 
□ 



Sometimes, to emphasize this property vectors have of not being anchored, we 
can refer to them as free vectors. Thus, these free vectors are equal as each is a 
displacement of one over and two up. 



More generally, vectors in the plane are the same if and only if they have the 
same change in first components and the same change in second components: the 
vector extending from (ai,a2) to (bijbi) equals the vector from (01,02) to 
(di , di) if and only if bi — ai = di — ci and bi — ai = — 02. 

Saying 'the vector that, were it to start at (ai , az), would extend to (bi ,b2)' 
would be unwieldy. We instead describe that vector as 




so that the 'one over and two up' arrows shown above picture this vector. 



We often draw the arrow as starting at the origin, and we then say it is in the 
canonical position (or natural position or standard position). When the 
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vector 

(::) 

is in canonical position then it extends to the endpoint (vijVi). 
We typically just refer to "the point 



rather than "the endpoint of the canonical position of" that vector. Thus, we 
will call each of these M^. 



{(xi,X2) I Xi,X2 e K} { |xi,X2e 






In the prior section we defined vectors and vector operations with an algebraic 
motivation; 

Vl +Wl 
Vi + W2 

we can now understand those operations geometrically. For instance, if 

V represents a displacement then 3v represents a displacement in the same 
direction but three times as far, and — 1v represents a displacement of the same 
distance as v but in the opposite direction. 



3v 



And, where v and w represent displacements, v+w represents those displacements 
combined. 




The long arrow is the combined displacement in this sense: if, in one minute, a 
ship's motion gives it the displacement relative to the earth of v and a passenger's 
motion gives a displacement relative to the ship's deck of w, then v + w is the 
displacement of the passenger relative to the earth. 

Another way to understand the vector sum is with the parallelogram rule. 
Draw the parallelogram formed by the vectors v and w. Then the sum v + w 
extends along the diagonal to the far corner. 



v + w 
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The above drawings show how vectors auid vector operations behave in M^. 
We can extend to M^, or to even higher-dimensional spaces where we have no 
pictures, with the obvious generahzation: the free vector that, if it starts at 
(qi , . . . , an), ends at (bi , . . . , bn), is represented by this column. 

/bi -ai\ 



\bn - an/ 

Vectors are equal if they have the same representation. We aren't too careful 
about distinguishing between a point and the vector whose canonical representa- 
tion ends at that point. 



r 

:{ : I I Vi,...,Vn GM} 

Vvr 



And, we do addition and scalar multiplication component-wise. 

Having considered points, we now turn to the lines. In R^, the line through 
(1,2) and (3, 1) is comprised of (the endpoints of) the vectors in this set. 




That description expresses this picture. 

2 



-) = (0-G) 



The vector associated with the parameter t 




has its whole body in the line — it is a direction vector for the line. Note that 

points on the line to the left of x = 1 are described using negative values of t. 

In M^, the line through (1 , 2, 1 ) and (2, 3, 2) is the set of (endpoints of) vectors 
of this form 

{(2| + t- 1 1 I IteR} 



1/ M 
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and lines in even higher-dimensional spaces work in the same way. 

In M^, a line uses one parameter so that a particle on that line is free to 
move back and forth in one dimension, and a plane involves two parameters. 
For example, the plane through the points (1,0,5), (2,1,-3), and (—2,4,0.5) 
consists of (endpoints of) the vectors in 



+ t 




+ s 




t, s e 



(the column vectors associated with the parameters 



V- 










/l 











V 


-4.5; 


^0.5; 





are two vectors whose whole bodies lie in the plane). As with the line, note that 
we describe some points in this plane with negative t's or negative s's or both. 

In algebra and calculus we often use a description of planes involving a single 
equation as the condition that describes the relationship among the first, second, 
and third coordinates of points in a plane. 




I 2x + y +z = 4} 



The translation from such a description to the vector description that we favor 
in this book is to think of the condition as a one-equation linear system and 
parametrize x = 2 — y/2 — z/2. 




p = { h = 



-1/2 




-l/2\ 

Ol \y,zeS.} 



Generalizing, a set of the form {p + tivi +t2V2 



+ tkVk I t,,...,tk G M} 



where vi , . . . , Vk G 
For example, in R"* 



and k ^ n is a k- dimensional linear surface (or k-fiat). 



t e 
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2^ 
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is a line, 
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t,s e M} 



is a plane, and 
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^0.5^ 







+ s 



r,s,t e M} 



is a three-dimensional linear surface. Again, the intuition is that a line permits 
motion in one direction, a plane permits motion in combinations of two directions, 
etc. (When the dimension of the linear surface is one less than the dimension of 
the space, that is, when we have an n— 1-flat in M^, then the surface is called a 
hyperplane.) 

A description of a linear surface can be misleading about the dimension. For 
example, this 
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is a degenerate plane because it is actually a line, since the vectors are multiples 
of each other so we can merge the two into one. 



/ 



A 
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1\ 





+ r 
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[- 


y 



r e M} 



We shall see in the Linear Independence section of Chapter Two what relation- 
ships among vectors causes the linear surface they generate to be degenerate. 

We finish this subsection by restating our conclusions from earlier in geometric 
terms. First, the solution set of a linear system with n unknowns is a linear 
surface in E^. Specifically, it is a k-dimensional linear surface, where k is the 
number of free variables in an echelon form version of the system. Second, the 
solution set of a homogeneous linear system is a linear surface passing through 
the origin. Finally, we can view the general solution set of any linear system 
as being the solution set of its associated homogeneous system offset from the 
origin by a vector, namely by any particular solution. 

Exercises 

/ 1.1 Find the canonical name for each vector, 
(a) the vector from (2, 1) to (4,2) in 
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(b) the vector from (3,3) to (2,5] in 

(c) the vector from (1,0,6) to (5,0,3) in R^ 

(d) the vector from (6,8,8) to (6,8,8) in R^ 
/ 1.2 Decide if the two vectors are equal. 

(a) the vector from (5,3) to (6,2) and the vector from (1,-2) to (1,1) 

(b) the vector from (2, 1 , 1 ) to (3, 0, 4) and the vector from (5, 1,4) to (6, 0, 7) 
/ 1.3 Does (1,0,2, 1) lie on the line through (-2,1,1,0) and (5,10,-1,4)? 

/ 1.4 (a) Describe the plane through (1,1,5,-1), (2,2,2,0), and (3,1,0,4). 
(b) Is the origin in that plane? 
1.5 Describe the plane that contains this point and line. 




/ 1.6 Intersect these planes 



k, m e Rj 



/ 1.7 Intersect each pair, if possible. 





s e R} 



s,w e R} 



1.8 When a plane does not pass through the origin, performing operations on vectors 
whose bodies lie in it is more complicated than when the plane passes through the 
origin. Consider the picture in this subsection of the plane 

z I y,z e R} 

and the three vectors with endpoints (2,0,0), (1.5,1,0), and (1.5,0,1). 

(a) Redraw the picture, including the vector in the plane that is twice as long 
as the one with endpoint (1.5, 1,0). The endpoint of your vector is not (3,2,0); 
what is it? 

(b) Redraw the picture, including the parallelogram in the plane that shows the 
sum of the vectors ending at (1 .5, 0, 1 ) and (1 .5, 1,0). The endpoint of the sum, 
on the diagonal, is not (3, 1 , 1 ); what is it? 

1.9 Show that the line segments (uj , a2)(bi , b2) and (C] , C2)(d) , di) have the same 
lengths and slopes if bi — ai = di — Ci and bi — ui = d2 — C2. Is that only if? 

1.10 How should we define R°? 

? / 1.11 [Math. Mag., Jan. 1957] A person traveling eastward at a rate of 3 miles per 
hour finds that the wind appears to blow directly from the north. On doubling his 
speed it appears to come from the north east. What was the wind's velocity? 
1.12 Euclid describes a plane as "a surface which lies evenly with the straight lines 
on itself". Commentators such as Heron have interpreted this to mean, "(A plane 
surface is) such that, if a straight line pass through two points on it, the line 
coincides wholly with it at every spot, all ways". (Translations from [Heath], pp. 
171-172.) Do planes, as described in this section, have that property? Does this 
description adequately define planes? 
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11.2 Length and Angle Measures 



We've translated the first section's results about solution sets into geometric 
terms, to better understand those sets. But we must be careful not to be misled 
by our own terms — labeling subsets of of the forms {p+tv] teM} and 
{p + tv + sw I t, s e M} as 'lines' and 'planes' doesn't make them act like the 
lines and planes of our past experience. Rather, we must ensure that the names 
suit the sets. While we can't prove that the sets satisfy our intuition — we 
can't prove anything about intuition — in this subsection we'll observe that a 
result familiar from and M^, when generalized to arbitrary M^, supports the 
idea that a line is straight and a plane is fiat. Specifically, we'll see how to do 
Euclidean geometry in a "plane" by giving a definition of the angle between two 
vectors in the plane that they generate. 

2.1 Definition The length (or norm) of a vector v e is the square root of the 
sum of the squares of its components. 



2.2 Remark This is a natural generalization of the Pythagorean Theorem. A 
classic discussion is in [Polya]. 

Note that for any nonzero v, the vector v/||v|| has length one. We say that 
the second vector normalizes v to length one. 

We can use that to get a formula for the angle between two vectors. Consider 
two vectors in where neither is a multiple of the other 



(the special case of multiples will prove below not to be an exception). They 
determine a two-dimensional plane — for instance, put them in canonical poistion 
and take the plane formed by the origin and the endpoints. In that plane consider 
the triangle with sides u, v, and u — v. 







Apply the Law of Cosines: ||u — v ||^ = ||u ||^ + ||v ||^ — 2 ||u || ||v || cos 6 where 



40 



Chapter One. Linear Systems 



is the angle between the vectors. The left side gives 

(Ui -Vi)^ + (U2 -Va)^ + (U3 -vs)^ 

= (uf -2U1V1 +vj) + (U2 -2U2V2 +V2) + (U3 -2U3V3 +V3) 

while the right side gives this. 

[u^+uj+ul) + (vf +v^) -2||u|| ||v|| cose 
Canceling squares u^, . . . , V3 and dividing by 2 gives the formula. 

, UlVi +U2V2 +U3V3 

e = arccos n^,, 

ll^ll l|v|| 

To give a definition of angle that works in higher dimensions we cannot draw 
pictures but we can make the argument analytically. 

First, the form of the numerator is clear — it comes from the middle terms 

of (U| - V|)^. 

2.3 Definition The dot product (or inner product or scalar product) of two 
n-component real vectors is the linear combination of their components. 

U • V = Ui Vi + U2V2 H h UnVn 

Note that the dot product of two vectors is a real number, not a vector, and 
that the dot product of a vector from with a vector from M.^ is not defined 
unless n equals m. Note also this relationship between dot product and length: 

U.U = UiUi H hUnUn. = ||u||^. 

2.4 Remark Some authors require that the first vector be a row vector and that 
the second vector be a column vector. We shall not be that strict and will allow 
the dot product operation between two column vectors. 

Still reasoning with letters but guided by the pictures, we use the next 
theorem to argue that the triangle formed by u, v, and u — v in lies in the 
planar subset of generated by u and v. 

2.5 Theorem (Triangle Inequality) For any u, v e M"^, 

||u + v|| sC ||u|| + ||v|| 

with equality if and only if one of the vectors is a nonnegative scalar multiple of 
the other one. 

This is the source of the familiar saying, "The shortest distance between two 
points is in a straight line." 

= finish 
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Proof (We'll use some algebraic properties of dot product that we have not yet 
checked, for instance that u- (a + h) = U'a + U'b and that u • v = v • u. See 
Exercise 18.) Since all the numbers are positive, the inequality holds if and only 
if its square holds. 

||u + vf s^(||u|| + ||v||)2 
(u + v).(u + v) ||u||V2||u|| ||v|| + ||vf 

U'U + U'V+V-U + V'V ^ U«U + 2||u|| ||v|| + V'V 

2u.v 2||u|| ||v|j 

That, in turn, holds if and only if the relationship obtained by multiplying both 
sides by the nonnegative numbers ||u|| and ||v| 

2(||v||u).(||u||v)<2||uf ||vf 

and rewriting 

Os^ ||uf ||v||2-2(||v||u).(||u||v) + ||uf ||vf 

is true. But factoring shows that it is true 

0< (||u||v-||v||u).(|lu|lv-|lv||u) 

since it only says that the square of the length of the vector ||u|| v— ||v || u is 
not negative. As for equality, it holds when, and only when, ||u || v — ||v || u is 
0. The check that ||u||v=||v||u if and only if one vector is a nonnegative real 
scalar multiple of the other is easy. QED 

This result supports the intuition that even in higher-dimensional spaces, 
lines are straight and planes are flat. We can easily check from the definition 
that linear surfaces have the property that for any two points in that surface, 
the line segment between them is contained in that surface. But if the linear 
surface were not flat then that would allow for a shortcut. 




Because the Triangle Inequality says that in any the shortest cut between 
two endpoints is simply the line segment connecting them, linear surfaces have 
no bends. 

Back to the definition of angle measure. The heart of the Triangle Inequality's 
proof is the u»v ^ ||u|| ||v|| line. We might wonder if some pairs of vectors 
satisfy the inequality in this way: while u>v is a large number, with absolute 
value bigger than the right-hand side, it is a negative large number. The next 
result says that does not happen. 
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2.6 Corollary (Cauchy-Schwartz Inequality) For any u, v e M"^, 

|u.v| s$ ||u|| ||v|| 

with equality if and only if one vector is a scalar multiple of the other. 

Proof The Triangle Inequality's proof shows that U'V ^ ||u|| ||v || so if u»v is 
positive or zero then we are done. If u • v is negative then this holds. 

lU'V]— —[U'V) — (— u) 'V ^ ||— ull ||v II = I I'LL II II V I 

The equality condition is Exercise 19. QED 

The Cauchy-Schwartz inequality assures us that the next definition makes 
sense because the fraction has absolute value less than or equal to one. 

2.7 Definition The angle between two nonzero vectors u, v e M.^ is 

= arccos t-—t---—t7 



(by definition, the angle between the zero vector and any other vector is right). 

2.8 Corollary Vectors from R"- are orthogonal, that is, perpendicular, if and only 
if their dot product is zero. 

2.9 Example These vectors are orthogonal. 



-1 / \ 1 



We've drawn the arrows away from canonical position but nevertheless the 
vectors are orthogonal. 

2.10 Example The angle formula given at the start of this subsection is a 
special case of the definition. Between these two 




the angle is 



, (1)(0) + (1)(3) + (0)(2) 
arccos^ , " " = arccos l 



Vl2 + 1^ + 0V02+32+22' ViVu' 
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approximately 0.94 radians. Notice that these vectors are not orthogonal. Al- 
though the ijz-plane may appear to be perpendicular to the xij-plane, in fact 
the two planes are that way only in the weak sense that there are vectors in each 
orthogonal to all vectors in the other. Not every vector in each is orthogonal to 
all vectors in the other. 



Exercises 

/ 2.11 Find the length of each vector. 



f] (b) ( 2 ) I 1 I (d) I I (e) 



/ 2.12 Find the angle between each two, if it is defined. 



( '\ 

1 

V 0/ 



/ 2.13 [Ohanian] During maneuvers preceding the Battle of Jutland, the British battle 
cruiser Lion moved as follows (in nautical miles): 1.2 miles north, 6.1 miles 38 
degrees east of south, 4.0 miles at 89 degrees east of north, and 6.5 miles at 31 
degrees east of north. Find the distance between starting and ending positions. 
(Ignore the earth's curvature.) 
2.14 Find k so that these two vectors are perpendicular. 

/4^ 



2.15 Describe the set of vectors in orthogonal to this one. 

G) 

/ 2.16 (a) Find the angle between the diagonal of the unit square in and one of 
the axes. 

(b) Find the angle between the diagonal of the unit cube in M? and one of the 
axes. 

(c) Find the angle between the diagonal of the unit cube in M.^ and one of the 
axes. 

(d) What is the limit, as n goes to oo, of the angle between the diagonal of the 
unit cube in and one of the axes? 

2.17 Is any vector perpendicular to itself? 
/ 2.18 Describe the algebraic properties of dot product. 

(a) Is it right-distributive over addition: (u + v)»w = u»w + v« w? 

(b) Is it left-distributive (over addition)? 

(c) Does it commute? 

(d) Associate? 

(e) How does it interact with scalar multiplication? 

As always, you must back any assertion with either a proof or an example. 
2.19 Verify the equality condition in Corollary 2.6, the Cauchy-Schwartz Inequal- 
ity. 

(a) Show that if u is a negative scalar multiple of v then u • v and v • u are less 
than or equal to zero. 
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(b) Show that |u>v| — ||u || ||v || if and only if one vector is a scalar multiple of the 
other. 

2.20 Suppose that u • v = u • w and u 7^ 0. Must v — w? 
/ 2.21 Does any vector have length zero except a zero vector? (If "yes", produce an 

example. If "no", prove it.) 
/ 2.22 Find the midpoint of the line segment connecting (xi,iji) with (x2,ij2) in K^- 

Generalize to M.^. 

2.23 Show that if v 7^ then v/||v || has length one. What if v = 0? 

2.24 Show that if r ^ then rv is r times as long as v. What if r < 0? 

/ 2.25 A vector v e E'^ of length one is a unit vector. Show that the dot product 
of two unit vectors has absolute value less than or equal to one. Can 'less than' 
happen? Can 'equal to' ? 

2.26 Prove that ||u + vf + ||u-vf = 2||u f + 2||v f . 

2.27 Show that if x • ij = for every y then x = 0. 

2.28 Is ||u] + ■ ■ ■ + Un|| ^ II + ■ ■ ■ + llunll? If it is true then it would generalize 
the Triangle Inequality. 

2.29 What is the ratio between the sides in the Cauchy-Schwartz inequality? 

2.30 Why is the zero vector defined to be perpendicular to every vector? 

2.31 Describe the angle between two vectors in . 

2.32 Give a simple necessary and sufficient condition to determine whether the angle 
between two vectors is acute, right, or obtuse. 

/ 2.33 Generalize to K'^ the converse of the Pythagorean Theorem, that if u and v are 
perpendicular then 

iiu+vr = iiuf + iivr. 

2.34 Show that ||u|| = ||v|| if and only if u + v and u — v are perpendicular. Give an 
example in K^. 

2.35 Show that if a vector is perpendicular to each of two others then it is perpen- 
dicular to each vector in the plane they generate. {Remark. They could generate a 
degenerate plane — a line or a point — but the statement remains true.) 

2.36 Prove that, where u, v e are nonzero vectors, the vector 

u V 

bisects the angle between them. Illustrate in R^. 

2.37 Verify that the definition of angle is dimensionally correct: (1) if k > then the 
cosine of the angle between kil and v equals the cosine of the angle between u and 
V, and (2) if k < then the cosine of the angle between ku and v is the negative of 
the cosine of the angle between u and v. 

/ 2.38 Show that the inner product operation is linear: for u,v, w e and k, m e R, 
u • (kv + mw) = k(u • v) + m(u • w). 

/ 2.39 The geometric mean of two positive reals x,y is y^xy. It is analogous to the 
arithmetic mean (x + y)/2. Use the Cauchy-Schwartz inequality to show that the 
geometric mean of any x, y e R is less than or equal to the arithmetic mean. 

? 2.40 [Cleary] Astrologers claim to be able to recognize trends in personality and 
fortune that depend on an individual's birthday by somehow incorporating where 
the stars were 2000 years ago, during the Hellenistic period. Suppose that instead 
of star-gazers coming up with stuff, math teachers who like linear algebra (we'll 
call them vectologers) had come up with a similar system as follows: Consider your 
birthday as a row vector (month day). For instance, I was born on July 12 so my 
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vector would be (7 12). Vectologers have made the rule that how well individuals 
get along with each other depends on the angle between vectors. The smaller the 
angle, the more harmonious the relationship. 

(a) Compute the angle between your vector and mine, expressing the answer in 
radians. 

(b) Would you get along better with me, or with a professor born on September 1 9? 

(c) For maximum harmony in a relationship, when should the other person be 
born? 

(d) Is there a person with whom you have a "worst case" relationship, i.e., your 
vector and theirs are orthogonal? If so, what are the birthdate(s) for such people? 
If not, explain why not. 

? 2.41 [Am. Math. Mon., Feb. 1933] A ship is sailing with speed and direction v, ; the 
wind blows apparently (judging by the vane on the mast) in the direction of a 
vector a; on changing the direction and speed of the ship from vi to V2 the apparent 
wind is in the direction of a vector b. 
Find the vector velocity of the wind. 
2.42 Verify the Cauchy-Schwartz inequality by first proving Lagrange's identity: 



of the L notation.) This result is an improvement over Cauchy-Schwartz because it 
gives a formula for the difference between the two sides. Interpret that difference 
in K^. 





1 ^j^n 
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III Reduced Echelon Form 

After developing the mechanics of Gauss's Method, we observed that it can be 
done in more than one way. For example, from this matrix 




we could derive any of these three echelon form matrices. 






The first results from — 2pi + p2. The second comes from following (1 /2)pi with 
— 4pi + p2. The third comes from — 2pi + p2 followed by 2p2 + pi (after the 
first row combination the matrix is already in echelon form so the second one is 
extra work but it is nonetheless a legal row operation). 

The fact that echelon form is not unique raises questions. Will any two 
echelon form versions of a linear system have the same number of free variables? 
If yes, will the two have exactly the same free variables? In this section we will 
give a way to decide if one linear system can be derived from another by row 
operations. The answers to both questions, both "yes," will follow from this. 



Ill.l Gauss-Jordan Reduction 

Gaussian eUmination coupled with back-substitution solves linear systems but 
it is not the only method possible. Here is an extension of Gauss's Method that 
has some advantages. 

1.1 Example To solve 

X + y — 2z = — 2 
y + 3z = 7 
X — z = — 1 

we can start as usual by going to echelon form. 
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We can keep going to a second stage by making the leading entries into 1 's 



(1/4)P3 
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\0 
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and then to a third stage that uses the leading entries to eliminate all of the 
other entries in each column by combining upwards. 



-3 p3 + P2 
2P3+Pl 



The answer is x = 1 , ij = 1 , and z — 2. 

Using one entry to clear out the rest of a column is pivoting on that entry. 

Note that the row combination operations in the first stage move left to right, 
from column one to column three, while the combination operations in the third 
stage move right to left. 

1.2 Example The middle stage operations that turn the leading entries into 1 's 
don't interact, so we can combine multiple ones into a single step. 



-2 pi + P2 

(1/2)pi 

(-1/4)P2 
-(1/2)P2 + P1 



The answer is x = 5/2 and y — 2. 

This extension of Gauss's Method is Gauss- Jordan reduction. 




1.3 Definition A matrix or linear system is in reduced echelon form if, in addition 
to being in echelon form, each leading entry is a one and is the only nonzero 
entry in its column. 

The cost of using Gauss- Jordan reduction to solve a system is the additional 
arithmetic. The benefit is that we can just read off the solution set from the 
reduced echelon form. 

In any echelon form, reduced or not, we can read off when the system has an 
empty solution set because there is a contradictory equation. We can read off 
when the system has a one-element solution set because there is no contradiction 
and every variable is the leading variable in some row. And, we can read off 
when the system has an infinite solution set because there is no contradiction 
and at least one variable is free. 

In reduced echelon form we can read off not just the size of the solution set 
but also its description. We have no trouble describing the solution set when it 
is empty, of course. Example 1.1 and 1.2 show how in a single element solution 
set case the single element is in the column of constants. The next example 
shows how to read the parametrization of an infinite solution set from reduced 
echelon form. 
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1.4 Example 
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As a linear system this is 



xi - 1 /2X3 
X2 + 1 /3X3 



SO a solution set description is this. 
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Thus echelon form isn't some kind of one best form for systems. Other forms, 
such as reduced echelon form, have advantages and disadvantages. Instead of 
picturing linear systems (and the associated matrices) as things we operate 
on, always directed toward the goal of echelon form, we can think of them as 
interrelated when we can get from one to another by row operations. The rest 
of this subsection develops this relationship. 



1.5 Lemma Elementary row operations are reversible. 



Proof For any matrix A, the effect of swapping rows is reversed by swapping 
them back, multiplying a row by a nonzero k is undone by multiplying by 1 /k, 
and adding a multiple of row i to row j (with i j) is undone by subtracting 
the same multiple of row i from row j . 

^ Pi^Pj Pj^Pi ^ ^ kpi^ (1/lc)Pi ^ ^ kPi + Pj -kpi + pj ^ 



(We need the i 7^ j condition; see Exercise 16.) QED 

Again, the point of view that we are developing, buttressed now by this 
lemma, is that the term 'reduces to' is misleading: where A — > B, we shouldn't 
think of B as "after" A or "simpler than" A. Instead we should think of them 
as inter-reducible or interrelated. Below is a picture of the idea. It shows the 
matrices from the start of this section and their reduced echelon form version in 
a cluster as inter-reducible. 
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We say that matrices that reduce to each other are 'equivalent with respect 
to the relationship of row reducibility'. The next result justifies this using the 
definition of an equivalence.* 

1.6 Lemma Between matrices, 'reduces to' is an equivalence relation. 

Proof We must check the conditions (i) reflexivity, that any matrix reduces 
to itself, (ii) symmetry, that if A reduces to B then B reduces to A, and 
(iii) transitivity, that if A reduces to B and B reduces to C then A reduces to C. 

Refiexivity is easy; any matrix reduces to itself in zero row operations. 

The relationship is symmetric by the prior lemma — if A reduces to B by 
some row operations then also B reduces to A by reversing those operations. 

For transitivity, suppose that A reduces to B and that B reduces to C. 
Following the reduction steps from A ^ • • • — > B with those from B — > • • • ^ C 
gives a reduction from A to C. QED 

1.7 Definition Two matrices that are inter-reducible by elementary row operations 
are row equivalent. 

The diagram below shows the collection of all matrices as a box. Inside that 
box, each matrix lies in some class. Matrices are in the same class if and only if 
they are interreducible. The classes are disjoint — no matrix is in two distinct 
classes. We have partitioned the collection of matrices into row equivalence 
classes.^ 



One of the classes in this partition is the cluster of matrices from the start of this 
section shown above, expanded to include all of the nonsingular 2x2 matrices. 

The next subsection proves that the reduced echelon form of a matrix is 
unique. Rephrased in terms of the row-equivalence relationship, we shall prove 
that every matrix is row equivalent to one and only one reduced echelon form 
matrix. In terms of the partition what we shall prove is: every equivalence class 
contains one and only one reduced echelon form matrix. So each reduced echelon 
form matrix serves as a representative of its class. 

* More information on equivalence relations is in the appendix. 

t More information on partitions and class representatives is in the appendix. 
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Exercises 

/ 1.8 Use Gauss- Jordan reduction to solve each system. 

(a) x + y= 2 (b) X - z = 4 (c) 3x - 2y = 1 

X - y = 2x + 2y =1 6x + y = 1 /2 

(d) 2x- y =-1 
X + 3y — z = 5 
y + 2z = 5 

/ 1.9 Find the reduced echelon form of each matrix, 
(a) (b) ( 2 4) (c) (l 



1 3 



1 


3 






2 





i) 


(c: 


1 


-3 










3 


1 




4 


2 


1 




4 


8 


1 






\3 



/ 1.10 Find each solution set by using Gauss- Jordan reduction and then reading off 
the parametrization. 

(a) 2x + y - z = 1 (b) x - z =1 (c) x - y + z =0 
4x — y =3 y+2z — w = 3 y +w = 

x + 2y + 3z — w = 7 3x — 2y+3z + w = 

— y — w = 

(d) Q + 2b + 3c + d - e = 1 
3a- b+ c + d+e = 3 
1.11 Give two distinct echelon form versions of this matrix. 

/2 1 1 3\ 
6 4 1 2 
\1 5 1 5/ 

/ 1.12 List the reduced echelon forms possible for each size. 

(a) 2x2 (b) 2x3 (c) 3x2 (d) 3x3 
/ 1.13 What results from applying Gauss- Jordan reduction to a nonsingular matrix? 

1.14 [Cleary] Consider the following relationship on the set of 2x2 matrices: we say 
that A is sum-what like B if the sum of all of the entries in A is the same as the 
sum of all the entries in B. For instance, the zero matrix would be sum-what like 
the matrix whose first row had two sevens, and whose second row had two negative 
sevens. Prove or disprove that this is an equivalence relation on the set of 2x2 
matrices. 

1.15 [Cleary] Consider the set of students in a class. Which of the following re- 
lationships are equivalence relations? Explain each answer in at least a sen- 
tence. 

(a) Two students x and y are related if x has taken at least as many math classes 
as y. 

(b) Students x and y are related if x and y have names that start with the same 
letter. 

1.16 The proof of Lemma 1.5 contains a reference to the i / j condition on the row 
combination operation. 

(a) The definition of row operations has an i / j condition on the swap operation 

pi •fT' pj . Show that in A — > — > A this condition is not needed. 

(b) Write down a 2x2 matrix with nonzero entries, and show that the — 1 • pi + pi 
operation is not reversed by 1 • pi + pi . 
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(c) Expand the proof of that lemma to make exphcit exactly where it uses the 
i 7^ j condition on combining. 



III. 2 The Linear Combination Lemma 

We will close this section and this chapter by proving that every matrix is row 
equivalent to one and only one reduced echelon form matrix. The ideas that 
appear here will reappear, and be further developed, in the next chapter. 

The crucial observation concerns how row operations act to transform one 
matrix into another: they combine the rows linearly. 

2.1 Example In this reduction 



'2 1 0\ -(i/2)p,+p2 2 1 0\ (i/2)p, n 1/2 
J 3 ~^ 1^0 5/2 (27^P2 \0 1 

-(l/2)p, + pi 



'1 

,0 1 



denoting those matrices A ^ D 
0C2, etc., we have this. 



0C2 



-(1/ 2)p^i +P2 

(1/2)^pi 
(2/5)P2 

-(1/2)P2 + P, 



G ^ B and writing the rows of A as ai and 

61 = ai \ 

62 = -(1/2)ai +a2 J 

( Yi =(V2)cxi \ 
V Y2 = -(l/5)ai +(2/5)a2 J 

/ |3i = (3/5)ai -(V5)a2 \ 
i |32 = -(1/5]ai + (2/5)a2 



2.2 Example This also holds if there is a row swap. With this A, D, G, and B 



2 \ p^2 / I 1\ (1/2),P2 / I 1 \ - 



1 1 



P2 + P1 



,0 2, 



,0 1 



'1 0^ 
1, 



we get these linear relationships. 

Pi-H-p / 01 = a2 \ (1/2),P2 



a. 



61 = 0.2 

62 = ai 



Yl = <X-2 

f2 = (l/2)ai 



-P2+P1 / Pi = (-l/2)ai + 1 • a2 \ 
[^2 = [^/2)a^ ) 

In summary. Gauss's Method systemmatically finds a suitable sequence of 
linear combinations of the rows. 
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2.3 Lemma (Linear Combination Lemma) A linear combination of linear combina- 
tions is a linear combination. 

Proof Given the set Ci jxi + • • • + Ci^nXn through Cmjxi + • • • + Cm,Ti''Tt of 
linear combinations of the x's, consider a combination of those 

dl(Cl,lXi H hCi,nXn) H h dm[Cm^^~>^^ H hCm,nXn] 

where the d's are scalars along with the c's. Distributing those d's and regrouping 
gives 

= (diCi,i H h dmCm,l)xi H h (diCi,n H h draCm,,Ti)x.n, 

which is also a linear combination of the x's. QED 

2.4 Corollary Where one matrix reduces to another, each row of the second is a 
linear combination of the rows of the first. 

The proof uses induction.* Before we proceed, here is an outline of the 
argument. For the base step, we will verify that the proposition is true when 
reduction can be done in zero row operations. For the inductive step, we will 
argue that if being able to reduce the first matrix to the second in some number 
t ^ of operations implies that each row of the second is a linear combination 
of the rows of the first, then being able to reduce the first to the second in t + 1 
operations implies the same thing. Together these prove the result because the 
base step shows that it is true in the zero operations case, and then the inductive 
step implies that it is true in the one operation case, and then the inductive 
step applied again gives that it is therefore true for two operations, etc. 

Proof We proceed by induction on the minimum number of row operations that 
take a first matrix A to a second one B. In the base step, that zero reduction 
operations suffice, the two matrices are equal and each row of B is trivially a 
combination of A's rows: Pt = • cci +■■• + ! ■ ai + ■ ■ • + • am- 

For the inductive step, assume the inductive hypothesis: with t ^ 0, any 
matrix that can be derived from A in t or fewer operations then has rows 
that are linear combinations of A's rows. Suppose that reducing from A to 
B requires t + 1 operations. There must be a next-to-last matrix G so that 
A — > ■ ■ ■ — > G — > B. The inductive hypothesis applies to this G because it is 
only t operations away from A. That is, each row of G is a linear combination 
of the rows of A. 

If the operation taking G to B is a row swap then the rows of B are just the 
rows of G reordered, and thus each row of B is a linear combination of the rows 
of G . If the operation taking G to B is multiplication of some row 1 by a scalar c 
then the rows of B are a linear combination of the rows of G; in particular, 
pi = CYi. And if the operation is adding a multiple of one row to another then 

* More information on mathematical induction is in the appendix. 
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clearly the rows of B are linear combinations of the rows of G . In all three cases 
the Linear Combination Lemma applies to show that each row of B is a linear 
combination of the rows of A. 

With both a base step and an inductive step, the proposition follows by the 
principle of mathematical induction. QED 

We now have the insight that Gauss's Method builds linear combinations 
of the rows. But of course the goal is to end in echelon form since it is a 
particularly basic version of a linear system, because echelon form is suitable for 
back substitution as it has isolated the variables. For instance, in this matrix 

3 7 8 0\ 
15 11 
3 3 
2 1/ 

xi has been removed from xs's equation. That is. Gauss's Method has made 
xs's row independent of xi 's row, in some sense. 

The following result makes this precise. What Gauss's linear elimination 
method eliminates is linear relationships among the rows. 

2.5 Lemma In an echelon form matrix, no nonzero row is a linear combination 
of the other nonzero rows. 

Proof Let R be in echelon form and consider the non-0 rows. First observe 
that if we have a row written as a combination of the others pi = Ci pi + • • • + 
Ci,-i pi,-i + Ci+i Pi+i + • • • + CmPm then we can rewrite that equation as 

= Cl Pl H h Ci_l Pi_l + Cipi + Ci+i pi+i H h CmPm (*) 

where not all the coefficients are zero; specifically, Ct = —1. The converse 
holds also: given equation {*) where some Ct ^ then we could express pt as a 
combination of the other rows by moving Cipi, to the left side and dividing by 
Ci. Therefore we will have proved the theorem if we show that in (*) all of the 
coefficients are 0. For that we use induction on the row index i. 

The base case is the first row i = 1 (if there is no such nonzero row, so R is 
the zero matrix, then the lemma holds vacuously). Recall our notation that li 
is the column number of the leading entry in row i. Equation (*) applied to the 
entries of the rows from column £] gives this. 

= Ciri,f, +C2r2,f, H hCmT"m,f, 

The matrix is in echelon form so every row after the first has a zero entry in 
that column ti^e, — ■ ■ ■ — rTa,{, — 0. Thus Ci = because ti^e, 7^ 0, as it leads 
the row. 

The inductive step is to prove this implication: if for each row index k e 
{1, . . . ,i} the coefficient Ck is then Ci+i is also 0. Consider the entries from 
column ^i+i in equation (*). 

= Ciri,{^_^, H h Ci+lti+i^f^^, H hCmrm,«i + i 



[2 
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By the inductive hypothesis the coefficients Ci , . . .Ci are all so the equation 
reduces to = Ci+i Ti^+j + • • • + CmTm.ei+i • -A-S in the base case, because the 
matrix is in echelon form ri+2,«i+, = ■ ■ ■ — rm,ti+-, — and ti+i^tj^, 7^ 0. Thus 
Ci+i = 0. QED 

2.6 Theorem Each matrix is row equivalent to a unique reduced echelon form 
matrix. 

Proof [Yuster] Fix a number of rows m. We will proceed by induction on the 
number of columns n. 

The base case is that the matrix has n — 1 column. If this is the zero matrix 
then its unique echelon form is the zero matrix. If instead it has any nonzero 
entries then when the matrix is brought to reduced echelon form it must have 
at least one nonzero entry, so it has a 1 in the first row. Either way, its reduced 
echelon form is unique. 

For the inductive step we assume that n > 1 and that all m row matrices 
with fewer than n columns have a unique reduced echelon form. Consider an 
mxn matrix A and suppose that B and C are two reduced echelon form matrices 
derived from A. We will show that these two must be equal. 

Let A be the matrix consisting of the first n — 1 columns of A. Observe 
that any sequence of row operations that bring A to reduced echelon form will 
also bring A to reduced echelon form. By the inductive hypothesis this reduced 
echelon form of A is unique, so if B and C differ then the difference must occur 
in their n-th columns. 

We finish the inductive step, and the argument, by showing that the two 
cannot differ only in that column. Consider a homogeneous system of equations 
for which A is the matrix of coefficients. 

ai,i''i + ai,2^2H h ai^nXn = 

a2,lXl + a2,2X2 H h ai.n^n = 

. (*) 

am.l^i + am,2X2 H h a^^ri^n = 

By Theorem One. 1. 1.5 the set of solutions to that system is the same as the set 
of solutions to B's system 



and to C's. 



bl,lX, + b,,2X2H h bi,nXn = 

b2,lXi + b2,2X2H h b2,nXn = 

bm,l^l + 'bm,2T^2-\ h bTn.n^n = 

Cl,lXi + Ci,2X2H h Ci,nXn = 

C2,lXi + C2,2X2H h C2,nXTi = 

Cm.lXl +Cm,2X2H \- Cm,n^n = 



Section III. Reduced Echelon Form 



55 



With B and C different only in column n, suppose that they differ in row i. 
Subtract row i of (***) from row i of (**) to get the equation (bi^_^ — Ci^n) •''n = 0- 
We've assumed that bi^n. 7^ Ci^n so the system solution includes that Xn — 0. 
Thus in (**) and the n-th column contains a leading entry, or else the 

variable Xn would be free. That's a contradiction because with B and C equal on 
the first n — 1 columns, the leading entries in the n-th column would have to be 
in the same row, and with both matrices in reduced echelon form, both leading 
entries would have to be 1 , and would have to be the only nonzero entries in 
that column. Thus B = C. QED 

That result answers the two questions that we posed in the introduction to 
this section: do any two echelon form versions of a linear system have the same 
number of free variables, and if so are they exactly the same variables? We get 
from any echelon form version to the reduced echelon form by pivoting up, and 
so uniqueness of reduced echelon form implies that the same variables are free 
in all echelon form version of a system. Thus both questions are answered "yes." 
There is no linear system and no combination of row operations such that, say, 
we could solve the system one way and get y and z free but solve it another way 
and get y and w free. 

We end this section with a recap. In Gauss's Method we start with a matrix 
and then derive a sequence of other matrices. We defined two matrices to be 
related if we can derive one from the other. That relation is an equivalence 
relation, called row equivalence, and so partitions the set of all matrices into 
row equivalence classes. 




(There are infinitely many matrices in the pictured class, but we've only got 
room to show two.) We have proved there is one and only one reduced echelon 
form matrix in each row equivalence class. So the reduced echelon form is a 
canonical form* for row equivalence: the reduced echelon form matrices are 
representatives of the classes. 




* More information on canonical representatives is in the appendix. 
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The underlying theme here is that one way to understand a mathematical 
situation is by being able to classify the cases that can happen. We have seen 
this theme several times already. We classified solution sets of linear systems 
into the no-elements, one-element, and infinitely-many elements cases. We also 
claissified linear systems with the same number of equations as unknowns into 
the nonsingular and singular cases. These classifications helped us understand 
the situations that we were investigating. Here, where we are investigating row 
equivalence, we know that the set of all matrices breaks into the row equivalence 
clcisses and we now have a way to put our finger on each of those claisses — we 
can think of the matrices in a class as derived by row operations from the unique 
reduced echelon form matrix in that class. 

Put in more operational terms, uniqueness of reduced echelon form lets us 
answer questions about the classes by translating them into questions about the 
representatives. For instance, we now (as promised in this section's opening) 
Ccin decide whether one matrix can be derived from another by row reduction. 
We apply the Gauss- Jordan procedure to both and see if they yield the same 
reduced echelon form. 

2.7 Example These matrices are not row equivalent 




because their reduced echelon forms are not equal. 

(o o) (o l) 

2.8 Example Any nonsingular 3x3 matrix Gauss- Jordan reduces to this. 

/l o\ 

10 
\0 1^ 

2.9 Example We can describe all the classes by listing all possible reduced echelon 
form matrices. Any 2x2 matrix lies in one of these: the class of matrices row 
equivalent to this, 




the infinitely many classes of matrices row equivalent to one of this type 




where a e K (including a = 0), the class of matrices row equivalent to this, 
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and the class of matrices row equivalent to this 




(this is the class of nonsingular 2x2 matrices) . 
Exercises 

/ 2.10 Decide if the matrices are row equivalent. 

-cs.ca »(!:;:)■(;■■•) 

G i 3) ' (? -I 1) 

2.11 Describe the matrices in each of the classes represented in Example 2.9. 

2.12 Describe all matrices in the row equivalence class of these. 

c s) G 3 (; 3) 

2.13 How many row equivalence classes are there? 

2.14 Can row equivalence classes contain different-sized matrices? 

2.15 How big are the row equivalence classes? 

(a) Show that for any matrix of all zeros, the class is finite. 

(b) Do any other classes contain only finitely many members? 

/ 2.16 Give two reduced echelon form matrices that have their leading entries in the 

same columns, but that are not row equivalent. 
/ 2.17 Show that any two nxn nonsingular matrices are row equivalent. Are any two 

singular matrices row equivalent? 
/ 2.18 Describe all of the row equivalence classes containing these, 
(a) 2x2 matrices (b) 2x3 matrices (c) 3x2 matrices 
(d) 3x3 matrices 

2.19 (a) Show that a vector |3o is a linear combination of members of the set 
{ |3 1 , . . . , (3n } if and only if there is a linear relationship = Co Po + • • • + Cn Pn 
where Co is not zero. {Hint. Watch out for the po = case.) 
(b) Use that to simplify the proof of Lemma 2.5. 
/ 2.20 [Trono] Three truck drivers went into a roadside cafe. One truck driver pur- 
chased four sandwiches, a cup of coffee, and ten doughnuts for $8.45. Another 
driver purchased three sandwiches, a cup of coffee, and seven doughnuts for $6.30. 
What did the third truck driver pay for a sandwich, a cup of coffee, and a doughnut? 
/ 2.21 The Linear Combination Lemma says which equations can be gotten from 
Gaussian reduction of a given linear system. 

(1) Produce an equation not implied by this system. 

3x + 4y = 8 
2x+ y=3 

(2) Can any equation be derived from an inconsistent system? 
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2.22 [Hoffman & Kunze] Extend the definition of row equivalence to linear systems. 
Under your definition, do equivalent systems have the same solution set? 
/ 2.23 In this matrix 

/I 2 3X 
3 3 
\1 4 5/ 

the first and second columns add to the third. 

(a) Show that remains true under any row operation. 

(b) Make a conjecture. 

(c) Prove that it holds. 



Computer Algebra Systems 



The linear systems in this chapter are small enough that their solution by hand 
is easy. But large systems are easiest, and safest, to do on a computer. There 
are special purpose programs such as LINPACK for this job. Another popular 
tool is a general purpose computer algebra system, including both commercial 
packages such as Maple, Mathematica, or MATLAB, or free packages such as 
Sage. 

For example, in the Topic on Networks, we need to solve this. 



1-0 - ii - ii = 

ii - 13 - is = 

12 - i4 + is = 

is + i4 - ie = 

5ii +10i3 =10 

lii +4i4 =10 

5ii - 1x2 + SOis = 



We could do this by hand but it would take a while and be error-prone. Using a 
computer is better. 

We illustrate by solving that system under Sage. 

sage : var ( ' iO , il , 12 , 13 , i4 , i5 , i6 ' ) 
(10, 11, 12, 13, 14, 15, 16) 

sage: network_SYStem=[10-ll-12==0, il-i3-i5==0, 

: 12-14+15==0, , i3+i4-i6==0, 5*il+104i3==10, 

: 2*12+4*14==10, 5*il-2*i2+504i5==0] 

sage: solve(network_SYStem, 10,11,12,13,14,15,16) 
[[10 == (7/3), 11 == (2/3), 12 == (5/3), 13 == (2/3), 
14 == (5/3), 15 == 0, 16 == (7/3)]] 

Magic. 

Here is the same system solved under Maple. We enter the array of coefficients 
and the vector of constants, and then we get the solution. 

> A:=arraY( [[1,-1,-1,0,0,0,0], 

[0,1,0,-1,0,-1,0], 
[0,0,1,0,-1,1,0], 
[0,0,0,1,1,0,-1], 
[0,5,0,10,0,0,0], 
[0,0,2,0,4,0,0] , 
[0,5,-2,0,0,50,0]] ); 

> u:=array( [0,0,0,0,10,10,0] ); 

> llnsolve(A,u) ; 
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7 2 5 2 5 7 

[ -, -, 0, - ] 

3 3 3 3 3 3 

If a system has infinitely many solutions then the program will return a 
parametrization. 

Exercises 

1 Use the computer to solve the two problems that opened this chapter. 

(a) This is the Statics problem. 

40h+15c = 100 

25c = 50 + 50h 

(b) This is the Chemistry problem. 

7h = 7] 
8h+U = 5j +2k 
U = 3i 
3i = 6j + Ik 

2 Use the computer to solve these systems from the first subsection, or conclude 
'many solutions' or 'no solutions'. 

(a) 2x + 2y = 5 (b) -x + y = 1 (c) x - 3y + z = 1 (d) -x - y = 1 
x-4y=0 x + y=2 x+y+2z = 14 -3x-3y=2 

(e) 4y+z = 20 (f) 2x + z + w= 5 
2x — 2y+z=0 y — w = — 1 

X +z=5 3x — z — w=0 

x+y-z = 10 4x + y+2z + w= 9 

3 Use the computer to solve these systems from the second subsection. 

(a) 3x + 6y = 1 8 (b) x + y = 1 (c) x, + X3 = 4 
X + 2y = 6 X — y = — 1 xi — X2 + 2x3 — 5 

4xi — X2 + 5x3 ~^7 
(d) 2a + b-c = 2 (e) x + 2y - z =3 (f) x +z + w=4 
2q +c = 3 2x+y +w — 4 2x + y — w = 2 

a — b =0 X— y + z + w = 1 3x + y+ z =7 

4 What does the computer give for the solution of the general 2x2 system? 

Qx + cy = p 
bx + dy = q 



Input-Output Analysis 



An economy is an immensely complicated network of interdependence. Changes 
in one part can ripple out to affect other parts. Economists have struggled to 
be able to describe, and to make predictions about, such a complicated object 
and mathematical models using systems of linear equations have emerged as a 
key tool. One is Input-Output Analysis, pioneered by W. Leontief, who won the 
1973 Nobel Prize in Economics. 

Consider an economy with many parts, two of which are the steel industry 
and the auto industry. These two interact tightly as they work to meet the 
demand for their product from other parts of the economy, that is, from users 
external to the steel and auto sectors. For instance, should the external demand 
for autos go up, that would increase in the auto industry's usage of steel. Or, 
should the external demand for steel fall, then it would lead lower steel's purchase 
of autos. The type of Input-Output model that we will consider takes in the 
external demands and then predicts how the two interact to meet those demands. 

We start with a listing of production and consumption statistics. (These 
numbers, giving dollar values in millions, are from [Leontief 1965], describing 
the 1958 U.S. economy. Today's statistics would be different, both because of 
inflation and because of technical changes in the industries.) 



used by used by used by 
steel auto others total 



value of 
steel 

value of 
auto 



5 395 2 664 25448 
48 9030 30346 



For instance, the dollar value of steel used by the auto industry in this year is 
2,664 million. Note that industries may consume some of their own output. 

We can fill in the blanks for the external demand. This year's value of the 
steel used by others is 1 7, 389 and this year's value of the auto used by others is 
21 , 268. With that, we have a complete description of the external demands and 
of how auto and steel interact, this year, to meet them. 

Now, imagine that the external demand for steel has recently been going up 
by 200 per year and so we estimate that next year it will be 1 7, 589. We also 
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estimate that next year's external demand for autos will be down 25 to 21,243. 
We wish to predict next year's total outputs. 

That prediction isn't as simple as adding 200 to this year's steel total and 
subtracting 25 from this year's auto total. For one thing, a rise in steel will cause 
that industry to have an increased demand for autos, which will mitigate to 
some extent the loss in external demand for autos. On the other hand, the drop 
in external demand for autos will cause the auto industry to use less steel and so 
lessen somewhat the upswing in steel's business. In short, these two industries 
form a system, and we need to predict where the system as a whole will settle. 

We have these equations. 

next year's production of steel = next year's use of steel by steel 

+ next year's use of steel by auto 
+ next year's use of steel by others 

next year's production of autos = next year's use of autos by steel 

+ next year's use of autos by auto 
+ next year's use of autos by others 

On the left side put the unknowns s be next years total production of steel 

and a for next year's total output of autos. At the ends of the right sides 
go our external demand estimates for next year 17,589 and 21,243. For the 
remaining four terms, we look to the table of this year's information about how 
the industries interact. 

For next year's use of steel by steel, we note that this year the steel industry 
used 5395 units of steel input to produce 25,448 units of steel output. So next 
year, when the steel industry will produce s units out, we expect that doing so 
will take s • (5395)/(25448) units of steel input — this is simply the assumption 
that input is proportional to output. (We are assuming that the ratio of input to 
output remains constant over time; in practice, models may try to take account 
of trends of change in the ratios.) 

Next year's use of steel by the auto industry is similar. This year, the auto 
industry uses 2664 units of steel input to produce 30346 units of auto output. So 
next year, when the auto industry's total output is a, we expect it to consume 
a • (2664)7(30346) units of steel. 

Filling in the other equation in the same way gives this system of linear 
equations. 

5395 
25448 

48 
25448 

Gauss's Method 

(20 053/25 448) s - (2 664/30 346) a = 1 7 589 
-(48/25 448)s + (21 31 6/30 346) a = 21 243 

gives s = 25 698 and a = 30 311 . 



s + ^f^^ -0 + 17589= s 



30346 

9 030 
30346 



a + 21243 = a 
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Looking back, recall that above we described why the prediction of next 
year's totals isn't as simple as adding 200 to last year's steel total and subtracting 
25 from last year's auto total. In fact, comparing these totals for next year to 
the ones given at the start for the current year shows that, despite the drop 
in external demand, the total production of the auto industry will rise. The 
increase in internal demand for autos caused by steel's sharp rise in business 
more than makes up for the loss in external demand for autos. 

One of the advantages of having a mathematical model is that we can ask 
"What if . . . ?" questions. For instance, we can ask "What if the estimates for 
next year's external demands are somewhat off?" To try to understand how 
much the model's predictions change in reaction to changes in our estimates, we 
can try revising our estimate of next year's external steel demand from 1 7, 589 
down to 17,489, while keeping the assumption of next year's external demand 
for autos fixed at 21 , 243. The resulting system 

(20 053/25 448) s - (2 664/30 346) a = 1 7 489 
-(48/25 448)s + (21 31 6/30 346) a = 21 243 

when solved gives s = 25 571 and a = 30 31 1 . This is sensitivity analysis. We 
are seeing how sensitive the predictions of our model are to the accuracy of the 
assumptions. 

Naturally, we can consider larger models that detail the interactions among 
more sectors of an economy; these models are typically solved on a computer. 
Naturally also, a single model does not suit every case and assuring that the 
assumptions underlying a model are reasonable for a particular prediction 
requires the judgments of experts. With those caveats however, this model has 
proven in practice to be a useful and accurate tool for economic analysis. For 
further reading, try [Leontief 1951] and [Leontief 1965]. 

Exercises 

Hint: these systems are easiest to solve on a computer. 

1 With the steel-auto system given above, estimate next year's total productions in 
these cases. 

(a) Next year's external demands are: up 200 from this year for steel, and 
unchanged for autos. 

(b) Next year's external demands are: up 100 for steel, and up 200 for autos. 

(c) Next year's external demands are: up 200 for steel, and up 200 for autos. 

2 In the steel-auto system, the ratio for the use of steel by the auto industry is 
2 664/30346, about 0.0878. Imagine that a new process for making autos reduces 
this ratio to .0500. 

(a) How will the predictions for next year's total productions change compared 
to the first example discussed above (i.e., taking next year's external demands 
to be 17,589 for steel and 21,243 for autos)? 

(b) Predict next year's totals if, in addition, the external demand for autos rises 
to be 21,500 because the new cars are cheaper. 
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3 This table gives the numbers for the auto-steel system from a different year, 1947 
(see [Leontief 1951]). The units here are billions of 1947 dollars. 

used by used by used by 



steel 



auto 



others total 



value of 
steel 



6.90 



1.28 



18.69 







4.40 



14.27 



value of 
autos 

(a) Solve for total output if next year's external demands are: steel's demand up 
10% and auto's demand up 15%. 

(b) How do the ratios compare to those given above in the discussion for the 1958 
economy? 

(c) Solve the 1947 equations with the 1958 external demands (note the difference 
in units; a 1947 dollar buys about what $1.30 in 1958 dollars buys). How far off 
are the predictions for total output? 

i Predict next year's total productions of each of the three sectors of the hypothetical 
economy shown below 

used by 
rail 



used by 
farm 



used by 
shipping 



used by 
others 



total 



value of 
farm 


25 


50 


100 


800 


value of 
rail 


25 


50 


50 


300 


value of 
shipping 


15 


10 





500 



if next year's external demands are as stated. 

(a) 625 for farm, 200 for rail, 475 for shipping 

(b) 650 for farm, 150 for rail, 450 for shipping 

5 This table gives the interrelationships among three segments of an economy (see 
[Clark & Coupe]). 

used by 
wholesale 



used by 
food 



used by 
retail 



used by 
others 



total 



value of 
food 
value of 
wholesale 
value of 
retail 







393 



2318 



1 089 



4 679 



22459 



53 



75 



11 869 



122242 



116041 



We will do an Input-Output analysis on this system. 

(a) Fill in the numbers for this year's external demands. 

(b) Set up the linear system, leaving next year's external demands blank. 

(c) Solve the system where we get next year's external demands by taking this 
year's external demands and inflating them 10%. Do all three sectors increase 
their total business by 10%? Do they all even increase at the same rate? 

(d) Solve the system where we get next year's external demands by taking this 
year's external demands and reducing them 7%. (The study from which these 
numbers come concluded that because of the closing of a local military facility, 
overall personal income in the area would fall 7%, so this might be a first guess 
at what would actually happen.) 



Accuracy of Computations 



Gauss's Method lends itself nicely to computerization. The code below illustrates. 
It operates on an nxn matrix a, doing row combinations using the first row, 
then the second row, etc. 

for (row=l ; row<=n-l ; row++) { 

for (row_below=row+l ; row_below<=n ; row_below++) { 
multiplier=a[row_below , row] /a [row , row] ; 
for(col=row; col<=n; col++){ 

a[row_below,col]-=multiplier^a[row,col] ; 

} 

} 

} 

This is in the C language. The loop for(row=i;row<=n-i;row++){ .. } initializes row at 
1 and then iterates while row is less than or equal to n — 1 , each time through 
incrementing row by one with the ++ operation. The other non-obvious language 
construct is that the '-=' in the innermost loop amounts to the a[row_below,col] — 

— multiplier ■ a[row,col] + a[row_below, col] Operation. 

This code provides a quick take on how mechanizing Gauss's Method. But 
it is naive in many ways. For one thing, it assumes that the entry in the row, row 
position is nonzero. One way that the code needs additional development to 
make it practical is to cover the case where finding a zero in that location leads 
to a row swap or to the conclusion that the matrix is singular. 

Adding some if statements to cover those cases is not hard, but we will 
instead consider some other ways in which the code is naive. It is prone to 
pitfalls arising from the computer's reliance on finite-precision floating point 
arithmetic. 

For example, we have seen above that we must handle as a separate case a 
system that is singular. But systems that are nearly singular also require care. 
Consider this one. 

X + 2y = 3 
1.00000001x + 2y =3.00000001 

We can easily spot the solution x = 1 , tj = 1 . But a computer has more trouble. If 
it represents real numbers to eight significant places, called single precision, then 
it will represent the second equation internally as 1 .000 000 0x + 2y — 3.000 000 0, 
losing the digits in the ninth place. Instead of reporting the correct solution, this 
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computer will report something that is not even close — this computer thinks 
that the system is singular because the two equations are represented internally 
as equal. 

For some intuition about how the computer could come up with something 
that far off, we graph the system. 



At the scale that we have drawn this graph we cannot tell the two lines apart. 
This system is nearly singular in the sense that the two lines are nearly the 
Scime line. Near-singularity gives this system the property that a small change 
in the system can cause a large change in its solution; for instance, changing the 
3.00000001 to 3.00000003 changes the intersection point from (1,1) to (3,0). 
This system changes radically depending on a ninth digit, which explains why an 
eight-place computer has trouble. A problem that is very sensitive to inaccuracy 
or uncertainties in the input values is ill-conditioned. 

The above example gives one way in which a system can be difficult to solve 
on a computer and it has the advantage that the picture of nearly-equal lines 
gives a memorable insight into one way that numerical difficulties can arise. 
Unfortunately this insight isn't useful when we wish to solve some large system. 
We cannot, typically, hope to understand the geometry of an arbitrary large 
system. In addition, there are ways that a computer's results may be unreliable 
other than that the angle between some of the linear surfaces is small. 

For an example, consider the system below, from [Hamming]. 



The second equation gives x='y,sox = y = 1/1.001 and thus both variables 
have values that are just less than 1 . A computer using two digits represents 
the system internally in this way (we will do this example in two-digit floating 
point arithmetic, but inventing a similar one with eight digits is easy). 



The computer's row reduction step — lOOOpi + pz produces a second equation 
— lOOly = —999, which the computer rounds to two places as (—1.0 x 10^ )y = 
—1.0 X 10^. Then the computer decides from the second equation that y — 1 
and from the first equation that x = 0. This y value is fairly good, but the x 
is very bad. Thus, another cause of unreliable output is a mixture of floating 
point arithmetic and a reliance on using leading entries that are small. 




O.OOIx + y 

x-y 







(*) 



(1.0 X 10-2)x+ (1.0 X 10°)y = 1.0 X 10° 
(1.0 X 10°)x- (1.0 X 10°)ij =0.0 X 10° 
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An experienced programmer may respond by going to double precision 
that retains sixteen significant digits. This will indeed solve many problems. 
However, double precision has twice the memory requirements and besides, 
we can obviously tweak the above systems to give the same trouble in the 
seventeenth digit, so double precision isn't a panacea. We need is a strategy to 
minimize the numerical trouble arising from solving systems on a computer as 
well as some guidance as to how far we can trust the reported solutions. 

A basic improvement on the naive code above is to not simply take the entry 
in the row, row position to determine the factor to use for the row combination, 
but rather to look at all of the entries in the row column below the row, row 
entry and take the one that is most likely to give reliable results (e.g., take one 
that is not too small). This is partial pivoting. 

For example, to solve the troublesome system {*) above, we start by looking 
at both equations for a best entry to use, and taking the 1 in the second 
equation as more likely to give good results. Then, the combination step of 
— .001p2 + pi gives a first equation of LOOly — 1, which the computer will 
represent as (1.0 x ]Q°)y — ] .0 x 10°, leading to the conclusion that y — 1 and, 
after back-substitution, x = 1 , both of which are close to right. We can adapt 
the code from above to this purpose. 

for (row=l ; row<=n-l ; row++) { 

/* find the largest entry in this column (in row max) 
max=row ; 

for (row_below=row+l ; row_below<=n ; row_below++) { 
if (abs(a[row_below,row] ) > abs(a[max,row])) ; 
max = row_below; 

} 

/* swap rows to move that best entry up */ 
f or (col=row ; col<=n ; C0I++) { 
temp=a[row,col] ; 
a[row,col]=a[max,col] ; 
a[max,col]=temp; 

} 

/* proceed as before */ 

for (row_below=row+l ; row_below<=n ; row_below++) { 
multiplier=a[row_below , row] /a [row , row] ; 
f or (col=row ; col<=n ; C0I++) { 

a[row_below,col]-=multiplier^fa[row,col] ; 

} 

} 

} 

A full analysis of the best way to implement Gauss's Method is outside the 
scope of the book (see [Wilkinson 1965]), but the method recommended by most 
experts first finds the best entry among the candidates and then scales it to a 
number that is less likely to give trouble. This is scaled partial pivoting. 

In addition to returning a result that is likely to be reliable, most well-done 
code will return a number, the conditioning number that describes the factor 
by which uncertainties in the input numbers could be magnified to become 
inaccuracies in the results returned (see [Rice]). 

The lesson is that, just because Gauss's Method always works in theory, and 
just because computer code correctly implements that method, doesn't mean 
that the answer is reliable. In practice, always use a package where experts have 
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worked hard to counter what can go wrong. 
Exercises 

1 Using two decimal places, add 253 and 2/3. 

2 This intersect-the-lines problem contrasts with the example discussed above. 



Illustrate that in this system some small change in the numbers will produce only 
a small change in the solution by changing the constant in the bottom equation to 
1 .008 and solving. Compare it to the solution of the unchanged system. 
3 Solve this system by hand ([Rice]). 



(a) Solve it accurately, by hand. (b) Solve it by rounding at each step to 
four significant digits. 

4 Rounding inside the computer often has an effect on the result. Assume that your 
machine has eight significant digits. 

(a) Show that the machine will compute (2/3) + ((2/3) — (1/3)) as unequal to 
((2/3) + (2/3)) — (1/3). Thus, computer arithmetic is not associative. 

(b) Compare the computer's version of (l/3)x + y = and {2/3)x + 2y = 0. Is 
twice the first equation the same as the second? 

5 Ill-conditioning is not only dependent on the matrix of coefficients. This example 
[Hamming] shows that it can arise from an interaction between the left and right 
sides of the system. Let e be a small real. 




A 



X + 2y = 3 
3x - 2y = 1 



0.0003x + 1.556y = 1.569 
0.3454x-2.346y = 1.018 



3x+ 2y+ z= 6 
2x + 2cy + 2£z = 2 + 4e 
X + 2ty — £z = 1 + e 



(a) Solve the system by hand. Notice that the e's divide out only because there is 
an exact cancellation of the integer parts on the right side as well as on the left. 

(b) Solve the system by hand, rounding to two decimal places, and with £ = 0.001 . 



Analyzing Networks 



The diagram below shows some of a car's electrical network. The battery is on 
the left, drawn as stacked line segments. The wires are lines, shown straight and 
with sharp right angles for neatness. Each light is a circle enclosing a loop. 




Brake Parking Rear Headlights 

Lights Lights Lights 



The designer of such a network needs to answer questions like: How much 
electricity flows when both the hi-beam headlights and the brake lights are on? 
We will use linear systems to analyze simple electrical networks. 

For the analysis we need two facts about electricity and two facts about 
electrical networks. 

The first fact about electricity is that a battery is like a pump, providing a 
force impelling the electricity to flow, if there is a path. We say that the battery 
provides a 'potential to flow. For instance, when the driver steps on the brake 
then the switch makes contact and so makes a circuit on the left side of the 
diagram, so the battery's force creates a current flowing through that circuit to 
turn on the brake lights. 

The second electrical fact is that in some kinds of network components the 
amount of flow is proportional to the force provided by the battery. That is, for 
each such component there is a number, it's resistance, such that the potential 
is equal to the flow times the resistance. Potential is measured in volts, the 
rate of flow is in amperes, and resistance to the flow is in ohms; these units are 
deflned so that volts — amperes • ohms. 
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Components with this property, that the voltctge-amperage response curve is a 
line through the origin, are resistors. For example, if a resistor measures 2 ohms 
then wiring it to a 12 volt battery results in a flow of 6 amperes. Conversely, if 
electrical current of 2 amperes flows through that resistor then there must be 
a 4 volt potentiail difference between it's ends. This is the voltage drop across 
the resistor. One way to think of the electrical circuits that we consider here is 
that the battery provides a voltage rise while the other components are voltage 
drops. 

The two facts that we need about networks are Kirchhoff's Laws. 

Current Law. For any point in a network, the flow in equals the flow out. 

Voltage Law. Around any circuit the total drop equals the total rise. 

We start with the network below. It has a battery that provides the potential 
to flow and three resistors, drawn as zig-zags. When components are wired one 
after another, as here, they are in series. 

-^WV\ 1 

2 ohm 

resistance L 

5 ohm <. 

resistance < 

3 ohm ' 

resistance 

^W\A 

By Kirchhoff's Voltage Law, because the voltage rise is 20 volts, the total voltage 
drop must also be 20 volts. Since the resistance from start to finish is 1 ohms 
(the resistance of the wire connecting the components is negligible), the current 
is (20/10) — 2 amperes. Now, by Kirchhoff's Current Law, there are 2 amperes 
through each resistor. Therefore the voltage drops are: 4 volts across the 2 oh m 
resistor, 1 volts across the 5 ohm resistor, and 6 volts across the 3 ohm resistor. 

The prior network is simple enough that we didn't use a linear system but 
the next one is more compUcated. Here the resistors are in parallel. 

' I 

— 20 volt 12 ohm < < 8 ohm 

I 

We begin by labeling the branches as below. Let the current through the left 
branch of the parallel portion be ii and that through the right branch be ii, 
and also let the current through the battery be io. Note that we don't need to 
know the actual direction of fiow — if current fiows in the direction opposite to 
our arrow then we will get a negative number in the solution. 



20 volt 
potential 
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The Current Law, applied to the point in the upper right where the flow to 
meets ii and ii, gives that io = ii +12- Applied to the lower right it gives 
ii +12 — io- In the circuit that loops out of the top of the battery, down the 
left branch of the parallel portion, and back into the bottom of the battery, 
the voltage rise is 20 while the voltage drop is ii -12, so the Voltage Law gives 
that 12ii = 20. Similarly, the circuit from the battery to the right breinch and 
back to the battery gives that 812 — 20. And, in the circuit that simply loops 
around in the left and right branches of the parallel portion (taken clockwise, 
arbitrarily), there is a voltage rise of and a voltage drop of Sii — 12ii so the 
Voltage Law gives that Siz — 12ii = 0. 

io - 1-1 - 12 = 
— io + i] + i2 = 
12ii =20 
8i2 = 20 
-12i, +8i2 = 

The solution is io — 25/6, ii — 5/3, and i2 — 5/2, all in amperes. (Incidentally, 
this illustrates that redundant equations can arise in practice.) 

Kirchhoff 's laws can establish the electrical properties of very complex net- 
works. The next diagram shows five resistors, wired in a series-parallel way. 



— 10 volt 




This network is a Wheatstone bridge (see Exercise 4). To analyze it, we can 
place the arrows in this way. 
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Kirchhoff 's Current Law, applied to the top node, the left node, the right node, 
and the bottom node gives these. 



to = ll + 12 

i-1 = 1-3 + i-5 

1-2 + is = 14 

is + i4 = to 

Kirchhoff 's Voltage Law, applied to the inside loop (the io to ii to is to io loop), 
the outside loop, and the upper loop not involving the battery, gives these. 

5ii + lOis = 10 
2i2 +4i4 = 10 
5ii + SOis - 2i2 = 

Those suffice to determine the solution io — 7/3, ii — 2/3, i2 — 5/3, is — 2/3, 
i4 = 5/3, and is — 0. 

We can understand many kinds of networks in this way. For instance, the 
exercises analyze some networks of streets. 

Exercises 

1 Calculate the amperages in each part of each network, 
(a) This is a simple network. 

I ^AAA^ 



— 9 volt 



3 ohm 



2 ohm 

-^vWV 



2 ohm 



(b) Compare this one with the parallel case discussed above. 

VA/V 



9 volt 



3 ohm 
2 ohm 2 ohm 

2 ohm 

WA 



(c) This is a reasonably complicated network. 




2 In the first network that we analyzed, with the three resistors in series, we just 
added to get that they acted together like a single resistor of 10 ohms. We can do 
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a similar thing for parallel circuits. In the second circuit analyzed, 



1 



20 volt 



12 ohm 



8 ohm 



T 



the electric current through the battery is 25/6 amperes. Thus, the parallel portion 
is equivalent to a single resistor of 20/(25/6) =4.8 ohms. 

(a) What is the equivalent resistance if we change the 1 2 ohm resistor to 5 ohms? 

(b) What is the equivalent resistance if the two are each 8 ohms? 

(c) Find the formula for the equivalent resistance if the two resistors in parallel 
are t] ohms and ohms. 

3 For the car dashboard example that opens this Topic, solve for these amperages 
(assume that all resistances are 2 ohms). 

(a) If the driver is stepping on the brakes, so the brake lights are on, and no other 
circuit is closed. 

(b) If the hi-beam headlights and the brake lights are on. 

4 Show that, in this Wheatstone Bridge, 




r2/T) equals r4/r3 if and only if the current flowing through tg is zero. (In practice, 
we place an unknown resistance at r4. At Tg we place a meter that shows the 
current. We vary the three resistances rj , 72, and (typically they each have 
a calibrated knob) until the current in the middle reads 0, and then the above 
equation gives the value of r4.) 

There are networks other than electrical ones, and we can ask how well Kirch- 
hoff's laws apply to them. The remaining questions consider an extension to 
networks of streets. 
5 Consider this traffic circle. 



Main Street 




North Avenue 



Pier Boulevard 



This is the traffic volume, in units of cars per five minutes. 

North Pier Main 

into 
out of 



100 150 25 
75 150 50 
We can set up equations to model how the traffic flows, 
(a) Adapt Kirchhoff's Current Law to this circumstance, 
modeling assumption? 



Is it a reasonable 
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(b) Label the three between-road arcs in the circle with a variable. Using the 
(adapted) Current Law, for each of the three in-out intersections state an equation 
describing the traffic flow at that node. 

(c) Solve that system. 

(d) Interpret your solution. 

(e) Restate the Voltage Law for this circumstance. How reasonable is it? 
This is a network of streets. 



west ■ 





Shelburne St 




Willow 


Jay Ln 












Winooski Ave 



east 



We can observe the hourly flow of cars into this network's entrances, and out of its 
exits. 

east Winooski west Winooski Willow Jay Shelburne 
into I 80 50 65 - 40 

out of 30 5 70 55 75 

(Note that to reach Jay a car must enter the network via some other road first, 
which is why there is no 'into Jay' entry in the table. Note also that over a long 
period of time, the total in must approximately equal the total out, which is why 
both rows add to 235 cars.) Once inside the network, the traffic may flow in different 
ways, perhaps filling Willow and leaving Jay mostly empty, or perhaps flowing in 
some other way. Kirchhoff's Laws give the limits on that freedom. 

(a) Determine the restrictions on the flow inside this network of streets by setting 
up a variable for each block, establishing the equations, and solving them. Notice 
that some streets are one-way only. {Hint: this will not yield a unique solution, 
since traffic can flow through this network in various ways; you should get at 
least one free variable.) 

(b) Suppose that someone proposes construction for Winooski Avenue East be- 
tween Willow and Jay, and traffic on that block will be reduced. What is the least 
amount of traffic flow that can we can allow on that block without disrupting 
the hourly flow into and out of the network? 



Vector Spaces 



The first chapter began by introducing Gauss' Method and finished with a 
fair understanding, keyed on the Linear Combination Lemma, of how it finds 
the solution set of a linear system. Gauss' Method systematically takes linear 
combinations of the rows. With that insight, we now move to a general study of 
linear combinations. 

We need a setting. At times in the first chapter we've combined vectors 
from M^, at other times vectors from M^, and at other times vectors from even 
higher-dimensional spaces. So our first impulse might be to work in M"^, leaving 
n unspecified. This would have the advantage that any of the results would hold 
for M} and for and for many other spaces, simultaneously. 

But if having the results apply to many spaces at once is advantageous then 
sticking only to M'^'s is overly restrictive. We'd like the results to also apply to 
combinations of row vectors, as in the final section of the first chapter. We've 
even seen some spaces that are not just a collection of all of the same-sized column 
vectors or row vectors. For instance, we've seen an example of a homogeneous 
system's solution set that is a plane, inside of M^. This solution set is a closed 
system in the sense that a linear combination of these solutions is also a solution. 
But it is not just a collection of all of the three-tall column vectors; only some 
of them are in the set. 

We want the results about linear combinations to apply anywhere that linear 
combinations make sense. We shall call any such set a vector space. Our results, 
instead of being phrased as "Whenever we have a collection in which we can 
sensibly take linear combinations . . . ", will be stated as "In any vector space 

Such a statement describes at once what happens in many spaces. To 
understand the advantages of moving from studying a single space at a time to 
studying a class of spaces, consider this analogy. Imagine that the government 
made laws one person at a time: "Leslie Jones can't jay walk." That would be 
a bad idea; statements have the virtue of economy when they apply to many 
cases at once. Or suppose that they ruled, "Kim Ke must stop when passing an 
accident." Contrast that with, "Any doctor must stop when passing an accident." 
More general statements, in some ways, are clearer. 
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I Definition of Vector Space 

We shall study structures with two operations, an addition and a scalar multi- 
plication, that are subject to some simple conditions. We will reflect more on 
the conditions later, but on first reading notice how reasonable they are. For 
instance, surely any operation that can be called an addition (e.g., column vector 
addition, row vector addition, or real number addition) will satisfy conditions 
(1) through (5) below. 



1.1 Definition and Examples 

1.1 Definition A vector space (over M) consists of a set V along with two 
operations '+' and '•' subject to these conditions. 

Where v, w G V, (1) their vector sum v+w is an element of V. If u, v, w e V 
then (2) v + w = w + v and (3) (v + w) + u = v + (w + u). (4) There is a zero 
vector e V such that v + = v for all v G V. (5) Each v e V has an additive 
inverse w e V such that w + v = 0. 

If r, s are scalars, members of M, and v, w G V then (6) each scalar multiple 
r • V is in V. If r, s G M and v, w e V then (7) (r + s) • v = r • v + s • v, and 
(8) r • (v + w) = r • V + r • w, and (9) (rs) • v = r • (s • v), and (10) 1 • v = v. 

1.2 Remark The definition involves two kinds of addition and two kinds of 
multiplication and so may at first seem confused. For instance, in condition (7) 
the '+' on the left is addition between two real numbers while the '+' on the right 
represents vector addition in V. These expressions aren't ambiguous because, 
for example, r and s are real numbers so 'r + s' can only mean real number 
addition. 

The best way to go through the examples below is to check all ten conditions 
in the definition. We write that check out at length in the first example. Use it 
as a model for the others. Especially important are the closure conditions, (1) 
and (6). They specify that the addition and scalar multiplication operations are 
always sensible — they are defined for every pair of vectors and every scalar and 
vector, and the result of the operation is a member of the set (see Example 1.4). 

1.3 Example The set Ej^ is a vector space if the operations '+' and '•' have their 
usual meaning. 

We shall check all of the conditions. 

There are five conditions in the paragraph having to do with addition. For 
(1), closure of addition, note that for any vi,V2,wi,W2 G M the result of the 
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sum 



is a column array with two real entries, and so is in R-^. For (2), that addition 
of vectors commutes, take all entries to be real numbers and compute 

Vi +Wi 
V2 + W2 

(the second equality follows from the fact that the components of the vectors are 
real numbers, and the addition of real numbers is commutative). Condition (3), 
associativity of vector addition, is similar. 

(vi +wi) +uA 

(V2 +W2) +U2 J 

+ (wi +ui) \ 

V2 + (W2 +U2] / 





For the fourth condition we must produce a zero element — the vector of zeroes 

(;)•(:)■(:) 

For (5), to produce an additive inverse, note that for any vi , V2 e R we have 




so the first vector is the desired additive inverse of the second. 

The checks for the five conditions having to do with scalar multiplication are 
similar. For (6), closure under scalar multiplication, where r,vi,V2 € M, 





is a column array with two real entries, and so is in R-^. Next, this checks (7). 

\vi J \[r + s)v2 J \rv2 + svi J 

For (8), that scalar multiplication distributes from the left over vector addition, 
we have this. 

r(vi +Wi)A ^ /rvi +rwi\ ^v^ \ /wA 
r(v2+W2)/ \rv2+rw2/ I V2 / 1^2/ 
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The ninth 

and tenth conditions are also straightforward. 







In a similar way, each is a vector space with the usual operations of vector 
addition and scalar multiplication. (In , we usually do not write the members 
as column vectors, i.e., we usually do not write '(ti)'. Instead we just write 'tt'.) 

1.4 Example This subset of that is a plane through the origin 



x + y + z = 0} 



is a vector space if '+' and '-'are interpreted in this way. 







(-2 




• 








[-2 










( rx 
















, rz 



The addition and scalar multiplication operations here are just the ones of M^, 
reused on its subset P. We say that P inherits these operations from M.^. This 
example of an addition in P 











(i) 


* 


it 


■(i) 











illustrates that P is closed under addition. We've added two vectors from P — 
that is, with the property that the sum of their three entries is zero — and the 
result is a vector also in P. Of course, this example of closure is not a proof of 
closure. To prove that P is closed under addition, take two elements of P. 







(-2 






y2 






U2 



Membership in P means that xi +yi +z^ 
that their sum 

y^ +y2 

Zl +Z2 



and X2 + y2 + Z2 = 0. Observe 
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is also in P since its entries add (xj + X2) + (yi + yi) + (zi + Z2) = (xi +\)^ + 
) + (x2 +^2+2^2) to 0. To show that P is closed under scalar multiplication, 
start with a vector from P 

V 

V/ 

where x + y + z = 0, and then for r e M observe that the scalar multiple 









r • 










[rz 



gives rx + ry + rz = r(x + ij + z) =0. Thus the two closure conditions are 
satisfied. Verification of the other conditions in the definition of a vector space 
are just as straightforward. 

1.5 Example Example 1.3 shows that the set of all two-tall vectors with real 
entries is a vector space. Example 1.4 gives a subset of an that is also a 
vector space. In contrast with those two, consider the set of two-tall columns 
with entries that are integers (under the obvious operations). This is a subset of 
a vector space but it is not itself a vector space. The reason is that this set is 
not closed under scalar multiplication, that is, it does not satisfy condition (6). 
Here is a column with integer entries and a scalar such that the outcome of the 
operation 



0.5 ■ 



is not a member of the set, since its entries are not all integers. 
1.6 Example The singleton set 





{ 







} 



is a vector space under the operations 




























that it inherits from W^. 

A vector space must have at least one element, its zero vector. Thus a 
one-element vector space is the smallest possible. 
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1.7 Definition A one-element vector space is a trivial space. 

The examples so far involve sets of column vectors with the usual operations. 
But vector spaces need not be collections of column vectors, or even of row 
vectors. Below are some other types of vector spaces. The term 'vector space' 
does not mean 'collection of columns of reals'. It means something more like 
'collection in which any linear combination is sensible'. 

1.8 Example Consider ^3 = {qq + aix + aix^ + a^x^ | ao, . . . , as G M}, the set 
of polynomials of degree three or less (in this book, we'll take constant polyno- 
mials, including the zero polynomial, to be of degree zero). It is a vector space 
under the operations 



[ao + aix + a2X 



asx J 
= (ao 



(bo 
bo) 



and 



r • (ao -f- a^x + a2X^ + asx^) 



bix + bix^ + b3X^) 

(ai +bi)x+ (a2 +b2)x^ + (as +b3)x^ 



(rao) + (rai )x -|- (ra2)x^ -|- (ra3)x^ 



(the verification is easy) . This vector space is worthy of attention because these 
are the polynomial operations familiar from high school algebra. For instance, 
3 • (1 -2x + 3x2 -4x^) -2 • (2-3x + x2 - []/2)x^) = -1 +7x^ - llx^. 

Although this space is not a subset of any M^, there is a sense in which we 
can think of Vj, as "the same" as M'*. If we identify these two space's elements in 
this way 

/ao\ 
ai 
a2 
\asj 

then the operations also correspond. Here is an example of corresponding 
additions. 



ao + aix 



a2X^ + asx^ 



corresponds to 



1 -2x + 0x2 + 1x^ 
-I- 2 + 3x + Zx^ - 4x^ 
3 + 1x + Zx^ -3x^ 



corresponds to 



( 




( 




( '\ 


-2 




3 




1 





+ 


Z 




z 


V ^) 




[-V 




[-V 



Things we are thinking of as "the same" add to "the same" sum. Chapter Three 
makes precise this idea of vector space correspondence. For now we shall just 
leave it as an intuition. 

1.9 Example The set M2X2 of 2x2 matrices with real number entries is a vector 
space under the natural entry- by-entry operations. 

a + w b + x\ 
c d/ ' \ij z/ \c+y d + zi 

As in the prior example, we can think of this space as "the same" as M'*. 
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1.10 Example The set {f | f : N — > M} of all real-valued functions of one natural 
number variable is a vector space under the operations 

(fi + fi] in) = fi (n) + f2(n] (r • f) (n) = rf(n] 

so that if, for example, fi (n) = + 2sin(n) and falti) — — sin(rL) + 0.5 then 
(f, + 2f2)(n) =n2 + 1. 

We can view this space as a generalization of Example 1.3 — instead of 2-tall 
vectors, these functions are like infinitely-tall vectors. 



n 


f (n) = + 1 





1 


1 


2 


2 


5 


3 


10 



corresponds to 



/ 1\ 

2 

5 
10 



Addition and scalar multiplication are component- wise, as in Example 1.3. (We 
can formalize "infinitely-tall" by saying that it means an infinite sequence, or 
that it means a function from N to M.) 

1.11 Example The set of polynomials with real coefiicients 

{ao + aix + • • ■ + anx"- | n e N and ao, . ■ . , an G M} 
makes a vector space when given the natural '+' 

(ao + Qix H h anX^ 



(bo + bixH + bnx'^) 

= (ao + bo) + (ai + bi )x + ■ 



bn)x" 



and 



r • (ao + aix + . . . a^x^) = (rao) + (rai )x + . . . (ra^jx^ 



This space differs from the space of Example 1.8. This space contains 
not just degree three polynomials, but degree thirty polynomials and degree 
three hundred polynomials, too. Each individual polynomial of course is of a 
finite degree, but the set has no single bound on the degree of all of its members. 

We can think of this example, like the prior one, in terms of infinite-tuples. 
For instance, we can think of 1 + 3x + 5x^ as corresponding to (1 , 3, 5, 0, 0, . . .). 
However, this space differs from the one in Example 1.10. Here, each member of 
the set has a finite degree, that is, under the correspondence there is no element 
from this space matching (1,2, 5, 10, . . . ). Vectors in this space correspond to 
infinite-tuples that end in zeroes. 

1.12 Example The set (f | f : M — > M} of all real-valued functions of one real 
variable is a vector space under these. 

(f 1 + f 2 ) (x) = f 1 (x) + f 2 (x] (r • f ) (x) = r f (x) 

The difference between this and Example 1.10 is the domain of the functions. 
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1.13 Example The set F = {a cos 9 + b sin | a, b € M} of real- valued functions of 
the real variable 9 is a vector space under the operations 

(ai cos 9 + bi sin 9] + (qi cos 9 + bi sin 9] = (ai + az] cos 9 + (bi + bi) sin 9 

and 

r • (a cos 9 + bsin9) = (ra) cos 9 + (rb) sin 9 

inherited from the space in the prior example. (We can think of F as "the same" 
as in that a cos 9 + b sin 9 corresponds to the vector with components a and 
b.) 

1.14 Example The set 

{f:K-^K — 5-+f = 0} 
is a vector space under the, by now natural, interpretation. 

(f + g] (x] = f (x] + g(x) (r • f ] (x] = r f (x) 

In particular, notice that closure is a consequence 

and 

of basic Calculus. This turns out to equal the space from the prior example — 
functions satisfying this differential equation have the form a cos 9 + b sin 9 — 
but this description suggests an extension to solutions sets of other differential 
equations. 

1.15 Example The set of solutions of a homogeneous linear system in n variables is 

a vector space under the operations inherited from K"^. For example, for closure 
under addition consider a typical equation in that system cixi + • • • + Cn_Xn = 
and suppose that both these vectors 











V = 




w = 






WJ 




\Wn/ 



satisfy the equation. Then their sum v + w also satisfies that equation: Ci (vi + 

Wi)4 hCn(Vn+Wrt) = (ClVl H h CnVn) + (Cl Wi H h CnWn) =0. The 

checks of the other vector space conditions are just as routine. 

As we've done in those equations, we often omit the multiplication symbol 

'•' between the scalar and the vector. We can distinguish the multiplication in 
'ciVi ' from that in 'rv ' by context, since if both multiplicands are real numbers 
then it must be real-real multiplication while if one is a vector then it must be 
scalar-vector multiplication. 
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Example 1.15 has brought us full circle since it is one of our motivating 
examples. Now, with some feel for the kinds of structures that satisfy the 
definition of a vector space, we can refiect on that definition. For example, why 
specify in the definition the condition that 1 • v = v but not a condition that 
• V = 0? 

One answer is that this is just a definition — it gives the rules of the game 
from here on, and if you don't like it, move on to something else. 

Another answer is perhaps more satisfying. People in this area have worked 
hard to develop the right balance of power and generality. This definition is 
shaped so that it contains the conditions needed to prove all of the interesting 
cind important properties of spaces of linear combinations. As we proceed, we 
shall derive all of the properties natural to collections of linear combinations 
from the conditions given in the definition. 

The next result is an example. We do not need to include these properties 
in the definition of vector space because they follow from the properties already 
listed there. 

1.16 Lemma In any vector space V, for any v e V and r e M, we have (1) -v = 0, 
and (2) (-1 • v) + v = 0, and (3) r • = 0. 

Proof For (1), note that v = (1 +0) • v = v + (0 • v). Add to both sides the 
additive inverse of v, the vector w such that w + v = 0. 

w + v = w + v + 0- v 
= + 0-v 
= 0-v 

Item (2) is easy: (—1 • v) +v = (—1 + 1 ) -v = - v = shows that we can write 
'— V ' for the additive inverse of v without worrying about possible confusion 
with (—1 ) • V. 

For (3) r • = r • (0 • 0) = (r • 0) • = will do. QED 

We finish with a recap. Our study in Chapter One of Gaussian reduction 
led us to consider collections of linear combinations. So in this chapter we have 
defined a vector space to be a structure in which we can form such combinations, 
expressions of the form Ci • V] + ■ ■ ■ + Cn ■ (subject to simple conditions on 
the addition and scalar multiplication operations). In a phrase: vector spaces 
are the right context in which to study linearity. 

Finally, a comment. From the fact that it forms a whole chapter, and 
especially because that chapter is the first one, a reader could suppose that 
our purpose is the study of linear systems. The truth is, we will not so much 
use vector spaces in the study of linear systems as we will instead have linear 
systems start us on the study of vector spaces. The wide variety of examples 
from this subsection shows that the study of vector spaces is interesting and 
important in its own right, aside from how it helps us understand linear systems. 
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Linear systems won't go away. But from now on our primary objects of study 
will be vector spaces. 

Exercises 

1.17 Name the zero vector for each of these vector spaces. 

(a) The space of degree three polynomials under the natural operations. 

(b) The space of 2x4 matrices. 

(c) The space {f : [0..1] — !> K | f is continuous}. 

(d) The space of real-valued functions of one natural number variable. 
/ 1.18 Find the additive inverse, in the vector space, of the vector. 

(a) In 7 3, the vector — 3 — 2x + x^. 

(b) In the space 2x2, 

-V 



(c) In {ae" + be^" | a, b e R}, the space of functions of the real variable x under 
the natural operations, the vector Se" —26^". 
/ 1.19 For each, list three elements and then show it is a vector space. 

(a) The set of linear polynomials = {uq + ajx | ao, ai e R} under the usual 
polynomial addition and scalar multiplication operations. 

(b) The set of linear polynomials {qq + ajx | ao — 2ai =0}, under the usual poly- 
nomial addition and scalar multiplication operations. 

Hint. Use Example 1.3 as a guide. Most of the ten conditions are just verifications. 
1.20 For each, list three elements and then show it is a vector space. 

(a) The set of 2x2 matrices with real entries under the usual matrix operations. 

(b) The set of 2x2 matrices with real entries where the 2, 1 entry is zero, under 
the usual matrix operations. 

/ 1.21 For each, list three elements and then show it is a vector space. 

(a) The set of three-component row vectors with their usual operations. 

(b) The set 

y 



z 
Vw/ 



eR'* x + -u-z + w = 0} 



under the operations inherited from R'*. 
/ 1.22 Show that each of these is not a vector space. {Hint. Check closure by listing 
two members of each set and trying some operations on them.) 
(a) Under the operations inherited from R^, this set 

z = l} 



1} 




(b) Under the operations inherited from R^, this set 

/ x^ 



\z 

(c) Under the usual matrix operations, 




{(l I a,b,ceR} 
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(d) Under the usual polynomial operations, 

{qo + aix + Q2X^ I Qo, Q] , a2 e - 
where K+ is the set of reals greater than zero 

(e) Under the inherited operations, 

fx 

\y. 



{ 



X + 3y = 4 and 2x — y = 3 and 6x + 4y = 10} 



1.23 Define addition and scalar multiplication operations to make the complex 
numbers a vector space over R. 
/ 1.24 Is the set of rational numbers a vector space over E under the usual addition 
and scalar multiplication operations? 

1.25 Show that the set of linear combinations of the variables x, y , z is a vector space 
under the natural addition and scalar multiplication operations. 

1.26 Prove that this is not a vector space: the set of two-tall column vectors with 
real entries subject to these operations. 

1.27 Prove or disprove that is a vector space under these operations. 



(a) 



(b) 





















/X2 




+ 






\Z2 




and 



and 




/ 1.28 For each, decide if it is a vector space; the intended operations are the natural 
ones. 



(a) The diagonal 2x2 matrices 



(b) This set of 2 x 2 matrices 



(c) This set 



{ 



X 

x + y 



y 



Q,b e R} 



x,y eR} 



y + w = 1 } 



(d) The set of functions {f : 

(e) The set of functions {f : ' 



R I df/dx + 2f = 0} 
R I df/dx + 2f = 1 } 

/ 1.29 Prove or disprove that this is a vector space: the real-valued functions f of one 

real variable such that f (7] = 0. 
/ 1.30 Show that the set R"'" of positive reals is a vector space when we interpret 'x + y' 

to mean the product of x and y (so that 2 + 3 is 6), and we interpret 'r • x' as the 

r-th power of x. 

1.31 Is {(x, y) I X, y G R} a vector space under these operations? 

(a) (xi,y,) + (X2,y2) = (xi +X2,yi +y2) and r- {x,y) = (rx,y) 

(b) {x,,y,) + (X2,y2) = (xi +X2,yi +y2] and r • {x,y) = (rx,0) 
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1.32 Prove or disprove that this is a vector space: the set of polynomials of degree 
greater than or equal to two, along with the zero polynomial. 

1.33 At this point "the same" is only an intuition, but nonetheless for each vector 
space identify the k for which the space is "the same" as R'^. 

(a) The 2x3 matrices under the usual operations 

(b) The n X m matrices (under their usual operations) 

(c) This set of 2 x 2 matrices 



/ 1.34 Using + to represent vector addition and ^ for scalar multiplication, restate 

the definition of vector space. 
/ 1.35 Prove these. 

(a) Any vector is the additive inverse of the additive inverse of itself. 

(b) Vector addition left-cancels: if v, s, t e V then v + s = v + t implies that s —t. 
1.36 The definition of vector spaces does not explicitly say that + v = v (it instead 

says that v + = v). Show that it must nonetheless hold in any vector space. 
/ 1.37 Prove or disprove that this is a vector space: the set of all matrices, under the 
usual operations. 

1.38 In a vector space every element has an additive inverse. Can some elements 
have two or more? 

1.39 (a) Prove that every point, line, or plane thru the origin in R-^ is a vector 
space under the inherited operations. 

(b) What if it doesn't contain the origin? 
/ 1.40 Using the idea of a vector space we can easily reprove that the solution set of 
a homogeneous linear system has either one element or infinitely many elements. 
Assume that v e V is not 0. 

(a) Prove that r • v = if and only if r = 0. 

(b) Prove that ri ■ v — rz ■ v if and only if ri — r2- 

(c) Prove that any nontrivial vector space is infinite. 

(d) Use the fact that a nonempty solution set of a homogeneous linear system is 
a vector space to draw the conclusion. 

1.41 Is this a vector space under the natural operations: the real-valued functions of 
one real variable that are differentiable? 

1.42 A vector space over the complex numbers C has the same definition as a vector 
space over the reals except that scalars are drawn from C instead of from R. Show 
that each of these is a vector space over the complex numbers. (Recall how complex 
numbers add and multiply: (qq + aii] + (bo + bii) — (qq +bo) + (ui +bi)i and 
(uo + aii)(bo +b,i) = (aobo - aibi ) + (aobi + aibo)i.) 

(a) The set of degree two polynomials with complex coefficients 



1.43 Name a property shared by all of the R'^'s but not listed as a requirement for a 
vector space. 




(d) This set of 2 x 2 matrices 




(b) This set 
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/ 1.44 (a) Prove that for any four vectors vi , . . . , V4 e V we can associate their sum 
in any way without changing the result. 

{(Vl + V2) +V3] + V4 = (Vl + {V2 +V3]) +V4 = (V, +V2) + {V3 +V4) 

= V, + ((V2 + V3] + V4) = Vl + {V2 + (V3 + V4)) 

This allows us to write 'vj +V2 +V3 +V4' without ambiguity, 
(b) Prove that any two ways of associating a sum of any number of vectors give 
the same sum. {Hint. Use induction on the number of vectors.) 

1.45 Example 1.5 gives a subset of that is not a vector space, under the obvious 
operations, because while it is closed under addition, it is not closed under scalar 
multiplication. Consider the set of vectors in the plane whose components have 
the same sign or are 0. Show that this set is closed under scalar multiplication but 
not addition. 

1.46 For any vector space, a subset that is itself a vector space under the inherited 
operations (e.g., a plane through the origin inside of E^) is a subspace. 

(a) Show that {uo + Qix + azx^ | qq + Q] + a2 = 0} is a subspace of the vector 
space of degree two polynomials. 

(b) Show that this is a subspace of the 2x2 matrices. 



(c) Show that a nonempty subset S of a real vector space is a subspace if and only 
if it is closed under linear combinations of pairs of vectors: whenever C] , C2 £ E 
and s 1 , S2 e S then the combination Ci V) + C2V2 is in S. 



One of the examples that led us to introduce the idea of a vector space was the 
solution set of a homogeneous system. For instance, we've seen in Example 1.4 
such a space that is a planar subset of M^. There, the vector space contains 
inside it another vector space, the plane. 

2.1 Definition For any vector space, a subspace is a subset that is itself a vector 
space, under the inherited operations. 

2.2 Example The plane from the prior subsection. 



is a subspace of . As specified in the definition, the operations are the ones 
that are inherited from the larger space, that is, vectors add in P as they add in 




1.2 Subspaces and Spanning Sets 
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and scalar multiplication is also the same as it is in M^. To show that P is a 
subspace, we need only note that it is a subset and then verify that it is a space. 
Checking that P satisfies the conditions in the definition of a vector space is 
routine. For instance, for closure under addition, note that if the summands 
satisfy that xi + y i + zi =0 and xz + iji + ^2 — then the sum satisfies that 
(xi +X2) + (yi +y2] + [z■^ +Z2) = (xi + z^] + [xz +y2 + Z2] =0. 
2.3 Example The x-axis in is a subspace where the addition and scalar 
multiplication operations are the inherited ones. 





As above, to verify that this is a subspace we simply note that it is a subset and 
then check that it satisfies the conditions in definition of a vector space. For 
instance, the two closure conditions are satisfied: (1) adding two vectors with a 
second component of zero results in a vector with a second component of zero, 
and (2) multiplying a scalar times a vector with a second component of zero 
results in a vector with a second component of zero. 

2.4 Example Another subspace of is its trivial subspace. 




Any vector space has a trivial subspace {0 }. At the opposite extreme, any 
vector space has itself for a subspace. These two are the improper subspaces. 
Other subspaces are proper. 

2.5 Example The definition requires that the addition and scalar multiplication 
operations must be the ones inherited from the larger space. The set S = { 1 } is 
a subset of . And, under the operations 1+1=1 and r • 1 = 1 the set S is 
a vector space, specifically, a trivial space. However, S is not a subspace of M' 
because those aren't the inherited operations, since of course has 1+1 =2. 

2.6 Example All kinds of vector spaces, not just K"-'s, have subspaces. The vector 
space of cubic polynomials {a + bx + cx^ + dx^ | a, b, c, d G M} has a subspace 
comprised of all linear polynomials {m + nx | m, n e M}. 

2.7 Example Another example of a subspace not taken from an is one from the 
examples following the definition of a vector space. The space of all real-valued 
functions of one real variable f : M ^ M has a subspace of functions satisfying 
the restriction (d^ f/dx^) + f = 0. 

2.8 Example Being vector spaces themselves, subspaces must satisfy the closure 
conditions. The set M+ is not a subspace of the vector space because with 
the inherited operations it is not closed under scalar multiplication: if v = 1 
then -1 • V ^ M+. 

The next result says that Example 2.8 is prototypical. The only way that 
a subset can fail to be a subspace, if it is nonempty and under the inherited 
operations, is if it isn't closed. 
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2.9 Lemma For a nonempty subset S of a vector space, under the inherited 
operations, the following are equivalent statements.* 

(1) S is a subspace of that vector space 

(2) S is closed under linear combinations of pairs of vectors: for any vectors 
si , 5*2 € S and scalars ti , rz the vector t] si + rzsz is in S 

(3) S is closed under linear combinations of any number of vectors: for any 
vectors s i , . . . , Sn G S and scalars t] , . . . , the vector t] si + • • • + rnSn is 
in S. 

Briefly, a subset is a subspace if it is closed under linear combinations. 

Proof 'The following are equivalent' means that each pair of statements are 
equivalent. 

(1) ^ (2] (2] ^ (3) (3) ^ (1) 

We will prove the equivalence by establishing that ( 1 ) (3) (2) ( 1 ) . 

This strategy is suggested by the observation that (1 ) =^ (3) and (3) (2) 
are easy and so we need only argue the single implication (2) (1). 

Assume that S is a nonempty subset of a vector space V that is S closed 
under combinations of pairs of vectors. We will show that S is a vector space by 
checking the conditions. 

The first item in the vector space definition has five conditions. First, for 
closure under addition, if s i , 5*2 G S then Si + S2 G S, as S] + 5*2 = 1 • S] + 1 • S2. 
Second, for any Si , s 2 G S, because addition is inherited from V, the sum si + 5*2 
in S equals the sum si + S2 in V, and that equals the sum S2 + s 1 in V (because 
V is a vector space, its addition is commutative), and that in turn equals the 
sum S2 + s 1 in S . The argument for the third condition is similar to that for the 
second. For the fourth, consider the zero vector of V and note that closure of S 
under linear combinations of pairs of vectors gives that (where s is any member 
of the nonempty set S) • s + • s = is in S; showing that acts under the 
inherited operations as the additive identity of S is easy. The fifth condition is 
satisfied because for any s G S, closure under linear combinations shows that the 
vector • + (— 1 ) • s is in S; showing that it is the additive inverse of s under 
the inherited operations is routine. 

The checks for the scalar multiplication conditions are similar; see Exercise 33. 
QED 

We will usually verify that a subset is a subspace with (2) (1). 

2.10 Remark At the start of this chapter we introduced vector spaces as collections 
in which linear combinations "make sense." Theorem 2.9's statements (l)-(3) 
say that we can always make sense of an expression like ti si + r2S2 — without 
restrictions on the r's — in that the vector described is in the set S. 

For a contrast, consider the set T of two-tall vectors whose entries add to 
a number greater than or equal to zero. Here we cannot just write a linear 



*More information on equivalence of statements is in tiie appendix. 
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combination such as 2ti — 3t2 and be sure the result is an element of T, that is, 
T doesn't satisfy statement (2). 

Lemma 2.9 suggests that a good way to think of a vector space is as a 
collection of unrestricted linear combinations. The next two examples take some 
spaces and recasts their descriptions to be in that form. 

2.11 Example We can show that this plane through the origin subset of 



■2y +z = 0} 



is a subspace under the usual addition and scalar multiplication operations 
of column vectors by checking that it is nonempty and closed under linear 
combinations of two vectors as in Example 2.2. But there is another way. Think 
of X— 2y +z = as a one-equation linear system and paramatrize it by expressing 
the leading variable in terms of the free variables x = 2y — z. 



y,zem = {y h +^ |y,zeM} (*) 






Now, to show that this is a subspace consider Tisi +r2S2. Each st is a linear 
combination of the two vectors in {*) so this is a linear combination of linear 
combinations. 



r,(iji 1 +zi )+r2(y2 1+^2 






The Linear Combination Lemma, Lemma One. III. 2. 3, shows that this is a linear 
combination of the two vectors and so Theorem 2.9's statement (2) is satisified. 

2.12 Example This is a subspace of the 2x2 matrices M2x2- 



b + c = 0} 



To parametrize, express the condition as a = — b — c. 

As above, we've described the subspace as a collection of unrestricted linear 
combinations. To show it is a subspace, note that a linear combination of vectors 
from L is a linear combination of linear combinations and so statement (2) is 
true. 
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2.13 Definition The span (or linear closure) of a nonempty subset S of a vector 
space is the set of all linear combinations of vectors from S. 

[S] = {ci si H h CnSn | Ci , . . . , Cn G K and Si , . . . , Sn G S} 

The span of the empty subset of a vector space is the trivial subspace. 

No notation for the span is completely standard. The square brackets used here 
are common but so are 'span(S)' and 'sp(S)'. 

2.14 Remark In Chapter One, after we showed that we can write the solution 
set of a homogeneous linear system as {ci (3i + • • • + c^Pic | Cj , . . . , G M}, we 
described that as the set 'generated' by the (3's. We now call that the span of 
{Pi,...,|3k}. 

Recall also the discussion of the "tricky point" in that proof. The span of 
the empty set is defined to be the set {0} because we follow the convention that 
a linear combination of no vectors sums to 0. Besides, defining the empty set's 
span to be the trivial subspace is convenient in that it keeps results like the next 
one from needing exceptions for the empty set. 

2.15 Lemma In a vector space, the span of any subset is a subspace. 

Proof If the subset S is empty then by definition its span is the trivial subspace. 
If S is not empty then by Lemma 2.9 we need only check that the span [S] 
is closed under linear combinations. For a pair of vectors from that span, 

V = Ci si + • • • + CnSn and w = Cn+i Sn+i + • • • + CmSm, a linear combination 

p • (CiSi ^ hCnSn) + r- (Cn+1 Sn+1 H h CmSm) 

= pC, Si H + pCnSn + TCn+l S^+l H h rC^Sm 

(p, r scalars) is a linear combination of elements of S and so is in [S] (possibly 
some of the Si's from v equal some of the Sj's from w, but it does not matter). 
QED 

The converse of the lemma holds: any subspace is the span of some set, 
because a subspace is obviously the span of the set of its members. Thus a 
subset of a vector space is a subspace if and only if it is a span. This fits the 
intuition that a good way to think of a vector space is as a collection in which 
linear combinations are sensible. 

Taken together. Lemma 2.9 and Lemma 2.15 show that the span of a subset 
S of a vector space is the smallest subspace containing all the members of S. 

2.16 Example In any vector space V, for any vector v the set {r • v | r e M} is a 
subspace of V. For instance, for any vector v e the line through the origin 
containing that vector {kv | k e M} is a subspace of M^. This is true even when 

V is the zero vector, in which case the subspace is the degenerate line, the trivial 
subspace. 
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2.17 Example The span of this set is all of M^. 

1 




To check this we must show that any member of K-^ is a linear combination of 
these two vectors. So we ask: for which vectors (with real components x and y) 
are there scalars ci and C2 such that this holds? 

Gauss's Method 

Ci+C2 = X -P1+P2 Ci + C2 = X 

ci-C2=y -2c2 = -x + y 

with back substitution gives C2 = (x — y)/2 and ci = (x + y)/2. These two 
equations show that for any x and y there are appropriate coefficients Ci and Ci 
making the above vector equation true. For instance, for x = 1 and y = 2 the 
coefBcients C2 = —1/2 and ci = 3/2 will do. That is, we can write any vector in 
R-^ as a linear combination of the two given vectors. 

Since spans are subspaces, and we know that a good way to understand a 
subspace is to parametrize its description, we can try to understand a set's span 
in that way. 

2.18 Example Consider, in 3'2, the span of the set {3x — x-^,2x}. By the def- 
inition of span, it is the set of unrestricted linear combinations of the two 

{ci (3x — x^) + C2(2x) I ci , C2 G M}. Clearly polynomials in this span must have 
a constant term of zero. Is that necessary condition also sufficient? 

We are asking: for which members a2X^ + aix + ao of J'2 are there Ci and 
C2 such that a2X^ + aix + ao = ci (3x — x^) + 02 (2x)? Since polynomials are 
equal if and only if their coefficients are equal, we are looking for conditions on 
ai, ai , and ao satisfying these. 

-ci = a2 

3ci + 2c2 = ai 
= ao 

Gauss's Method gives that ci = — a2, 02 = (3/2)a2 + (1 /2)ai , and = ao. Thus 
the only condition on polynomials in the span is the condition that we knew 
of — as long as qq = 0, we can give appropriate coefficients ci and C2 to describe 
the polynomial Qq + ai x + a2X^ as in the span. For instance, for the polynomial 
— 4x + 3x^, the coefficients ci = —3 and 02 = 5/2 will do. So the span of the 
given set is {aix + a2X-^ | ai , 02 e K}. 

This shows, incidentally, that the set {x, x^ } also spans this subspace. A space 
can have more than one spanning set. Two other sets spanning this subspace 
are {x, x-^, — x + 2x-^ } and {x, x + x^, x + 2x-^, . . . }. (Naturally, we usually prefer 
to work with spanning sets that have only a few members.) 
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2.19 Example These are the subspaces of that we now know of, the trivial 
subspace, the lines through the origin, the planes through the origin, and the 
whole space (of course, the picture shows only a few of the infinitely many 
subspaces). In the next section we will prove that has no other type of 
subspaces, so in fact this picture shows them all. 









-0 







{X 




■0" 



We have described the subspaces as spans of sets with a minimal number of 
members and shown them connected to their supersets. Note that the subspaces 
fall naturally into levels — planes on one level, lines on another, etc. — according 
to how many vectors are in a minimal-sized spanning set. 

So far in this chapter we have seen that to study the properties of linear 
combinations, the right setting is a collection that is closed under these combina- 
tions. In the first subsection we introduced such collections, vector spaces, and 
we saw a great variety of examples. In this subsection we saw still more spaces, 
ones that happen to be subspaces of others. In all of the variety we've seen a 
commonality. Example 2.19 above brings it out: vector spaces and subspaces 
are best understood as a span, and especially as a span of a small number of 
vectors. The next section studies spanning sets that are minimal. 

Exercises 

/ 2.20 Which of these subsets of the vector space of 2 x 2 matrices are subspaces 
under the inherited operations? For each one that is a subspace, parametrize its 
description. For each that is not, give a condition that fails. 

(a) {(^^ °) I Q,beK} 

(b) {(o l)\- + ^ = o} 

ic){{; °)|a + b = 5} 

(d) {(^^ b) I 'i + b^O'CeR} 

/ 2.21 Is this a subspace of Vz'- {uo + qix + azx^ | ao + 2ai + az = 4}? If it is then 
parametrize its description. 
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/ 2.22 Decide if the vector lies in the span of the set, inside of the space. 

(a) ' ^ ' ^' 

(b) x-x^, {x^,2x + x^,x + x^}, in 7^ 



(c) 



{ 



in M2 



^2 0^ 
J V'V2 3y 

2.23 Which of these are members of the span [{cos^ x, sin^ x}] in the vector space of 
real-valued functions of one real variable? 

(a) f(x) = 1 (b) f(x) =3 + x^ (c) f(x) = sinx (d) f(x) = cos(2x) 
/ 2.24 Which of these sets spans R-^? That is, which of these sets has the property 
that any three-tall vector can be expressed as a suitable linear combination of the 
set's elements? 

/ 2.25 Parametrize each subspace's description. Then express each subspace as a 
span. 

Q — c = 0} of the three-wide row vectors 





(a) The subset {(a b 

(b) This subset of M2X2 



(c) This subset of M2 



a+d = 0} 



{ 



2q - c - d = and a + 3b = 0} 



(d) The subset {a + bx + cx^ | a - 2b + c = 0} of ?3 

(e) The subset of 72 of quadratic polynomials p such that p(7] =0 

/ 2.26 Find a set to span the given subspace of the given space. {Hint. Parametrize 
each.) 

(a) the xz-plane in R-^ 

(b) {0 I 3x + 2y + z = 0} in R^ 

y 

z 
Vw/ 

(d) {qo + aix + a2X^ + qjx^ | qq + ai — and 02 — Qj = 0} in CPs 

(e) The set J'4 in the space J'4 

(f) M2x2 in M2x2 

2.27 Is R2 a subspace of R^? 
/ 2.28 Decide if each is a subspace of the vector space of real-valued functions of one 
real variable. 

(a) The even functions {f : R ^> R | f(— x) = f(x) for all x}. For example, two 
members of this set are fi (x) = x^ and f2{x) = cos(x). 



(c){ 



2x + y + w = and y + 2z = 0} in 
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(b) The odd functions {f : R ^> R | f(— x) = — f(x) for all x}. Two members are 
f3(x) = x^ and f4(x] — sin(x). 

2.29 Example 2.16 says that for any vector v that is an element of a vector space 
V, the set {r • v | r e R} is a subspace of V. (This is of course, simply the span of 
the singleton set {v}.) Must any such subspace be a proper subspace, or can it be 
improper? 

2.30 An example following the definition of a vector space shows that the solution 
set of a homogeneous linear system is a vector space. In the terminology of this 
subsection, it is a subspace of R*^ where the system has n variables. What about 
a non-homogeneous linear system; do its solutions form a subspace (under the 
inherited operations)? 

2.31 [Cleary] Give an example of each or explain why it would be impossible to do 
so. 

(a) A nonempty subset of M.2x2 that is not a subspace. 

(b) A set of two vectors in R^ that does not span the space. 

2.32 Example 2.19 shows that R^ has infinitely many subspaces. Does every non- 
trivial space have infinitely many subspaces? 

2.33 Finish the proof of Lemma 2.9. 

2.34 Show that each vector space has only one trivial subspace. 

/ 2.35 Show that for any subset S of a vector space, the span of the span equals the 
span [[S]] = [S]. {Hint. Members of [S] are linear combinations of members of S. 
Members of [[S]] are linear combinations of linear combinations of members of S.) 

2.36 All of the subspaces that we've seen use zero in their description in some way. 
For example, the subspace in Example 2.3 consists of all the vectors from R^ with 
a second component of zero. In contrast, the collection of vectors from R^ with a 
second component of one does not form a subspace (it is not closed under scalar 
multiplication). Another example is Example 2.2, where the condition on the 
vectors is that the three components add to zero. If the condition were that the 
three components add to one then it would not be a subspace (again, it would fail 
to be closed). This exercise shows that a reliance on zero is not strictly necessary. 
Consider the set 

{^yj |x + y+z = l} 

under these operations. 

xA /x2\ /xi+X2-l\ /x\ /rx-r + 1\ 

yi + y2 = yi+y2 ^h^^ ^ 

zi/ \Z2/ \ Z1+Z2 / \z/ \ rz / 

(a) Show that it is not a subspace of R-'. {Hint. See Example 2.5). 

(b) Show that it is a vector space. Note that by the prior item. Lemma 2.9 can 
not apply. 

(c) Show that any subspace of R^ must pass through the origin, and so any 
subspace of R^ must involve zero in its description. Does the converse hold? 
Does any subset of R^ that contains the origin become a subspace when given 
the inherited operations? 

2.37 We can give a justification for the convention that the sum of zero-many vectors 
equals the zero vector. Consider this sum of three vectors Vi + V2 +V3. 

(a) What is the difference between this sum of three vectors and the sum of the 
first two of these three? 
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(b) What is the difference between the prior sum and the sum of just the first 
one vector? 

(c) What should be the difference between the prior sum of one vector and the 
sum of no vectors? 

(d) So what should be the definition of the sum of no vectors? 

2.38 Is a space determined by its subspaces? That is, if two vector spaces have the 
same subspaces, must the two be equal? 

2.39 (a) Give a set that is closed under scalar multiplication but not addition. 

(b) Give a set closed under addition but not scalar multiplication. 

(c) Give a set closed under neither. 

2.40 Show that the span of a set of vectors does not depend on the order in which 
the vectors are listed in that set. 

2.41 Which trivial subspace is the span of the empty set? Is it 



or some other subspace? 
2.42 Show that if a vector is in the span of a set then adding that vector to the set 
won't make the span any bigger. Is that also 'only if? 
/ 2.43 Subspaces are subsets and so we naturally consider how 'is a subspace of 
interacts with the usual set operations. 

(a) If A, B are subspaces of a vector space, must their intersection A n B be a 
subspace? Always? Sometimes? Never? 

(b) Must the union A U B be a subspace? 

(c) If A is a subspace, must its complement be a subspace? 
{Hint. Try some test subspaces from Example 2.19.) 

/ 2.44 Does the span of a set depend on the enclosing space? That is, if W is a 

subspace of V and S is a subset of W (and so also a subset of V), might the span 

of S in W differ from the span of S in V? 
2.45 Is the relation 'is a subspace of transitive? That is, if V is a subspace of W 

and W is a subspace of X, must V be a subspace of X? 
/ 2.46 Because 'span of is an operation on sets we naturally consider how it interacts 

with the usual set operations. 

(a) If S C T are subsets of a vector space, is [S] C [T]? Always? Sometimes? 
Never? 

(b) If S,T are subsets of a vector space, is [S U T] = [S] U [T]? 

(c) If S,T are subsets of a vector space, is [S fl T] = [S] n [T]? 

(d) Is the span of the complement equal to the complement of the span? 

2.47 Reprove Lemma 2.15 without doing the empty set separately. 

2.48 Find a structure that is closed under linear combinations, and yet is not a 
vector space. [Remark. This is a bit of a trick question.) 



{ } C R3 
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II Linear Independence 

The prior section shows how to understand a vector space as a span, as an 
unrestricted linear combination of some of its elements. For example, the space 
of linear polynomials { a + bx | a, b e M} is spanned by the set { 1 , x}. The prior 
section also showed that a space can have many sets that span it. Two more 
sets that span the space of linear polynomials are {l,2x} and {1,x, 2x}. 

At the end of that section we described some spanning sets as 'minimal' 
but we never precisely defined that word. We could mean that a spanning set 
is minimal if it contains the smallest number of members of any set with the 
same span, so that {1,x, 2x} is not minimal because it has three members while 
we've given spanning sets with two. Or we could mean that a spanning set is 
minimal when it has no elements that we can remove without changing the span. 
Under this meaning {1,x, 2x} is not minimal because removing the 2x to get 
{l,x} leaves the span unchanged. 

The first sense of minimality appears to be a global requirement, in that 
to check if a spanning set is minimal we seemingly must look at all the sets 
that span and find one with the least number of elements. The second sense 
of minimality is local since we need to look only at the set and consider the 
span with and without various elements. For instance, using the second sense 
we could compare the span of { 1 , x, 2x} with the span of { 1 , x} and note that the 
2x is a "repeat" in that its removal doesn't shrink the span. 

In this section we will use the second sense of 'minimal spanning set' because 
of this technical convenience. However, the most important result of this book 
is that the two senses coincide. We will prove that in the next section. 



II. 1 Definition and Examples 

1.1 Example Recall the Statics example from the opening of Section One. I. We 
first got a balance with the unknown-mass objects at 40 cm and 1 5 cm and then 
got another balance at —50 cm and 25 cm. With those two pieces of information 
we could compute values of the unknown masses. Had we instead gotten the 
second balance at 20 cm and 7.5 cm then we would not have been able to find the 
unknown values. The difficulty is that the (20 7.5) information is a "repeat" of 
the (40 15) information. That is, (20 7.5) is in the span of the set {(40 15)} 
and so we would be trying to solve a two-unknowns problem with essentially 
one piece of information. 

As that example shows, to know whether adding a vector to a set will increase 
the span or conversely whether removing that vector will decrease the span, we 
need to know whether the vector is a linear combination of other members of 
the set. 
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1.2 Definition A multiset subset of a vector space is linearly independent if none 
of its elements is a linear combination of the others.* Otherwise it is linearly 
dependent. 

Observe that, although this way of writing one vector as a combination of 
the others 



visually sets so off from the other vectors, algebraically there is nothing special 
about it in that equation. For any Si with a coefficient Ct that is non-0 we can 
rewrite the relationship to set off si. 

Si = (1/Cv)So H + (-Ci_l/Ci)Si_i + (-Ci+i/Ci)Si+i H h (-Cn/Cv)Sn 

When we don't want to single out any vector by writing it alone on one side of 
the equation we will instead say that sb, si , . . . , Sn are in a linear relationship 
and write the relationship with all of the vectors on the same side. The next 
result rephrases the linear independence definition in this style. It is how we 
usually compute whether a finite set is dependent or independent. 

1.3 Lemma A subset S of a vector space is linearly independent if and only if 
among the elements s i , . . . , Sn G S the only linear relationship 



is the trivial one Ci = 0, . . . , Cn = 0. 

Proof If S is linearly independent then no vector Si is a linear combination 
of other vectors from S so there is no linear relationship where some of the s 's 
have nonzero coefficients. 

If S is not linearly independent then some si is a linear combination si = 
CiSi + • • • + Ci_iS|_i +Ci+iSi,+i + • • • + CnSn of other vectors from S. Subtracting 
Si from both sides gives a relationship involving a nonzero coefficient, the —1 in 
front of Si. QED 

1.4 Example In the vector space of two- wide row vectors, the two-element set 
{(40 15), (—50 25)} is linearly independent. To check this, take 



So = Cl Si + C2S2 H 1- CnSn 



Cl Si H h CnSn = 



. . . , Crt e M 



Ci-(40 15)+C2-(-50 25) = (0 0) 



and solving the resulting system 




50C2 = 
(175/4)C2 =0 



shows that both Ci and C2 are zero. So the only linear relationship between the 
two given row vectors is the trivial relationship. 

*More information on multisets is in the appendix. 
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In the same vector space, {(40 15), (20 7.5)} is linearly dependent since 
we Ccin satisfy 

c,(40 15)+C2-(20 7.5) = (0 0) 
with Ci — 1 and C2 — —2. 

1.5 Example The set {1 +x, 1 — x} is linearly independent in Tz, the space of 
quadratic polynomials with real coefficients, because 

+ Ox + Ox^ = ci (1 +x) + C2(1 -x) = (Cl + C2) + (Ci - C2)X + 0x^ 

gives 

Ci+C2=0 -P1+P2 Cl + C2=0 
Cl — C2 = 2C2 = 

since polynomials are equcd only if their coefficients are equal. Thus, the only 
linear relationship between these two members of 5*2 is the trivial one. 

1.6 Example The rows of this matrix 

/l 3 1 

A= -1 
\0 

form a linearly independent set. This is easy to check in this case, but also recall 
that Lemma One. III. 2. 5 shows that the rows of any echelon form matrix form a 
linearly independent set. 

1.7 Example In M^, where 




the set S — {vi ,V2,V3} is linearly dependent because this is a relationship 

• vi + 2 • V2 - 1 • V3 = 

where not all of the scalars are zero (the fact that some of the scalars are zero 
doesn't matter). 

That example illustrates why, although Definition 1.2 is a clearer statement 
of what independence is. Lemma 1.3 is more useful for computations. Working 
straight from the definition, someone trying to compute whether S is linearly 
independent would start by setting vi — C2V2 + C3V3 and concluding that there 
are no such C2 and C3 . But knowing that the first vector is not dependent on the 
other two is not enough. This person would have to go on to try V2 — c^v^ +C3V3 
to find the dependence Ci = 0, C3 = 1 /2. Lemma 1.3 gets the same conclusion 
with only one computation. 

1.8 Example The empty subset of a vector space is linearly independent. There 
is no nontrivial linear relationship among its members as it has no members. 
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1.9 Example In any vector space, any subset containing the zero vector is linearly 
dependent. For example, in the space J'2 of quadratic polynomials, consider the 
subset {1 + X, X + x^, 0}. 

One way to see that this subset is linearly dependent is to use Lemma 1.3: we 
have • vi + • V2 + 1 -0 = 0, and this is a nontrivial relationship as not all of the 
coefficients are zero. Another way to see that this subset is linearly dependent is 
to go straight to Definition 1.2: we can express the third member of the subset 
as a linear combination of the first two, namely, we can satisfy CiVi + C2V2 — 
by taking Ci = and Cz = (in contrast to the lemma, the definition allows all 
of the coefficients to be zero). 

There is subtler way to see that this subset is dependent. The zero vector is 
equal to the trivial sum, the sum of the empty set. So a set containing the zero 
vector has an element that is a combination of a subset of other vectors from 
the set, specifically, the zero vector is a combination of the empty subset. 

1.10 Remark [Velleman] Definition 1.2 says that when we decide whether some S 
is linearly independent, we must consider it as a multiset. Here is an example 
showing that we can need multiset rather than set (recall that in a set repeated 
elements collapse so that the set {0,1,0} equals the set {0,1}, whereas in a 
multiset they do not collapse so that the multiset {0, 1 , 0} contains the element 
twice). In the next chapter we will look at functions. Let the function f : M. 
be f (a + bx) — a; for instance, f (1 + 2x) — 1 . Consider the subset B ={1,1 + x} 
of the domain. The images of the elements are f(l) = 1 and f(l + x) — 1. 
Because in a set repeated elements collapse to be a single element these images 
form the one-element set {1 }, which is linearly independent. But in a multiset 
repeated elements do not collapse so these images form a linearly dependent 
multiset {1,1}. The second case is the correct one: B is linearly independent but 
its image under f is linearly dependent. 

Most of the time we won't need the set-multiset distinction and we will 
typically follow the standard convention of referring to a linearly independent 
or dependent "set." 

This section began with a discussion and an example about when a set 
contcdns "repeat" elements, ones that we can omit without shrinking the span. 
The next result characterizes when this happens. And, it supports the definition 
of linear independence because it says that such a set is a minimal spanning set 
in that we cannot omit any element without changing its span. 

1.11 Lemma If v is a member of a vector space V and S C V then [S — {v}] C [S]. 
Also: (1) if V € S then [S - {v}] = [S] if and only if v G [S - {v}] and (2) the 
condition that removal of any v e S shrinks the span [S — {v}] ^ [S] holds if and 
only if S is linearly independent. 

Proof First, [S— {v}] C [S] because an element of [S— {v}] is a linear combination 
of elements of S — {v}, and so is a linear combination of elements of S, and so is 
an element of [S]. 
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For statement (1), one half of the if and only if is easy: if v ^ [S — {v}] then 
[S — {v}] ^ [S] since the set on the right contains v while the set on the left does 
not. 

The other half of the if and only if assumes that v G [S — {v}], so that it 
is a combination v = Ci Si + • • • + CnSn of members of S — {v}. To show that 
[S — {v}] = [S], by the first paragraph we need only show that each element 
of [S] is an element of [S — {v}]. So consider a linear combination diSn+i + 
+ dm+iv e [S] (we can assume that each Sn+j is unequal to v). 
Substitute for v 

dl Sn+l + ■ • • + draSn+m + dm+1 (ClSi H h 

to get a linear combination of linear combinations of members of [S — {v}], which 
is a member of [S — {v}]. 

For statement (2) assume first that S is linearly independent and that v e S. 
If removal of v did not shrink the span, so that v e [S — {v}], then we would have 
V = Ci S] + • • • + CnSn, which would be a linear dependence among members of 
S, contradicting that S is independent. Hence v ^ [S — {v}] and the two sets are 
not equal. 

Do the other half of this if and only if statement by assuming that S is not 
linearly independent, so that some linear dependence s — c^s^ + • • • + CnSn 
holds among its members (with no sV equal to s) . Then s G [S — { s }] and by 
statement (1) its removal will not shrink the span [S — {s}] — [S]. QED 

We can also express that in terms of adding vectors rather than of omitting 
them. 

1.12 Lemma If v is a member of the vector space V and S is a subset of V then 
[S] C [S U{v}]. Also: (1) adding v to S does not increase the span [S] = [S U{v}] 
if and only if v e [S], and (2) if S is linearly independent then adjoining v to S 
gives a set that is also linearly independent if and only if v ^ [S] . 

Proof The first sentence and statement (1) are translations of the first sentence 
and statement (1) from the prior result. 

For statement (2) assume that S is linearly independent. Suppose first 
that V ^ [S]. If adjoining v to S resulted in a nontrivial linear relationship 
Ci S] + C2S2 + • • • + CnsVi + Cn+i V — then because the linear independence of S 
implies that Cn.+i ^ (or else the equation would be a nontrivial relationship 
among members of S), we could rewrite the relationship as v = —(c^ /Cn+^ ]s^ — 
■ ■ ■ — (Cn/Cn+i jsn to get the contradiction that v G [S]. Therefore if v ^ [S] then 
the only linear relationship is trivial. 

Conversely, if we suppose that v e [S] then there is a dependence v — 
Ci S] + • • • + CnSn (si G S) inside of S with v adjoined. QED 
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1.13 Example This subset of 



is linearly independent. 

S={ 




The span of S is the x-axis. Here are two supersets, one that is linearly dependent 
and the other independent. 



dependent: { 




independent: { 




This illustrates Lemma 1.12: we got the dependent superset by adding a vector 
in the x-axis and so the span did not grow, while we got the independent superset 
by adding a vector that isn't in [S] because it has a nonzero y component. 
For the independent set 



the span [S] is the xij-plane. Here are two supersets. 




dependent: { 




independent: 




} 



As above, the additional member of the dependent superset comes from [S], here 
the xy-plane, while the additional member of the independent superset comes 
from outside of that plane. 

Now consider this independent set [S] = . 











s = { 


(")' 


(■)■ 














Here is a linearly dependent superset 

^1 

dependent: { 

but there is no linearly independent superset. One way to see that is to note 
that for any vector that we would add to S, the equation 

Cl I I + C2 ' ' 

has a solution Ci — x, Cz — y, and Cs — z. Another way to see it is Lemma 1.12 — 
we cannot add any vectors from outside of the span [S] because that span is all 
of 
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1.14 Corollary In a vector space, any finite set has a linearly independent subset 
with the same span. 

Proof If S = {si , . . . , Sn} is linearly independent then S itself satisfies the 
statement, so assume that it is linearly dependent. 

By the definition of dependence, S contains a vector V] that is a linear 
combination of the others. Define the set Si = S — {vi }. By Lemma 1.11 the 
span does not shrink: [Si] = [S] (since adding vi to S would not cause the span 
to grow). 

If Si is linearly independent then we are done. Otherwise iterate: take a 
vector V2 that is a linear combination of other members of Si and discard it 
to derive S2 = Si — {V2} such that [S2] = [Si]. Repeat this until a linearly 
independent set Sj appears; one must appear eventually because S is finite and 
the empty set is linearly independent. (Formally, this argument uses induction 
on the number of elements in S. Exercise 38 asks for the details.) QED 

1.15 Example This set spans (the check is routine) but is not linearly inde- 
pendent. 




We will find vectors to drop to get a subset that is independent but has the 
same span. This linear relationship 



Cl 





w 




(*) 



gives this system 



Cl + C3 + -|-3c5=0 
2C2 + 2C3 — C4 + 3C5 = 
04 =0 



whose solution set has this parametrization. 



C2 
C3 
C4 
\C5j 



C3 



/-A 

1 





+ C5 



/ -3\ 

-3/2 




V V 



C3,C5 e M} 



If we set one of the free variables to 1 , and the other to 0, then we get Ci 
C2 = —3/2, and C4 — 0. We have this instance of (*). 



+ 0. 



0^ 

+ 0-1-1 
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Thus the vector associated with the free variable C5 is in the span of the set of 
vectors associated with the leading variables Ci and Ca- Lemma 1.11 says that 
we can discard the fifth vector without shrinking the span. 

Similarly, in the parametrization of the solution set let C3 = 1 , and C5 — 0, 
to get an instance of (*) showing that we can discard the third vector without 
shrinking the span. 

Thus this set 



has the same span as S. We can easily check that it is linearly independent and 
so discarding any of its elements will shrink the span. 

1.16 Corollary A subset S = {si , . . . , Sn} of a vector space is linearly dependent 
if and only if some si is a linear combination of the vectors s 1 , . . . , s\_i listed 
before it. 

Proof Consider So — {}, Si = {si }, S2 = {si , S2 }, etc. Some index i ^ 1 is the 
first one with Si_i U {si} linearly dependent, and there St G [Si-iJ. QED 

The proof of Corollary 1.14 describes producing a linearly independent set by 
shrinking, that is, by taking subsets. And the proof of Corollary 1.16 describes 
finding a linearly dependent set by taking supersets. We finish this subsection 
by considering how linear independence and dependence interact with the subset 
relation between sets. 

1.17 Lemma Any subset of a linearly independent set is also linearly independent. 
Any superset of a linearly dependent set is also linearly dependent. 

Proof Both are clear. QED 

Restated, subset preserves independence and superset preserves dependence. 

Those are two of the four possible cases. The third case, whether subset 
preserves linear dependence, is covered by Example 1.15, which gives a linearly 
dependent set S with one subset that is linearly dependent and another that is 
independent. The fourth case, whether superset preserves linear independence, 
is covered by Example 1.13, which gives cases where a linearly independent set 
has both an independent and a dependent superset. 

This table summarizes. 




Si c S 



Si D S 



S independent S^ must be independent 
S dependent S^ may be either 



Si may be either 
Si must be dependent 



Example 1.13 has something else to say about the interaction between linear 
independence and superset. It names a linearly independent set that is maximal 
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in that it has no supersets that are linearly independent. By Lemma 1.12 a 
linearly independent set is maximal if and only if it spans the entire space, 
because that is when no vector exists that is not already in the span. This nicely 
complements the fact that Lemma 1.11 shows that a spanning set is minimal if 
and only if it is linearly independent. 

In summary, we have introduced the definition of linear independence to 
formalize the idea of the minimality of a spanning set. We have developed some 
properties of this idea. The most important is Lemma 1.12, which tells us that 
a linearly independent set is maximal when it spans the space. 

Exercises 

is linearly dependent or linearly indepen- 




e; 

/ 1.19 Which of these subsets of J'3 are linearly dependent and which are indepen- 
dent? 

(a) {3 -x + 9x^,5 -6x + 3x^,1 + Ix-Sx^} 

(b) {-x\]+4x^} 

(c) {2 + x + 7x2,3-x + 2x2, 4-3x2} 

(d) {8 + 3x + 3x^x + 2x^2 + 2x + 2x^ 8 - 2x + Sx^} 

/ 1.20 Prove that each set {f, g} is linearly independent in the vector space of all 
functions from R+ to R. 

(a) f(x) — X and g(x) — l/x 

(b) f(x) = cos(x) and g(x) = sin(x) 

(c) f(x] = e" and g{x) = ln(x) 

/ 1.21 Which of these subsets of the space of real- valued functions of one real variable 
is linearly dependent and which is linearly independent? (Note that we have 
abbreviated some constant functions; e.g., in the first item, the '2' stands for the 
constant function f{x) = 2.) 

(a) {2,4sin2(x),cos2(x)} (b) {1,sin(x),sin(2x)} (c) {x,cos(x)} 

(d) {(1 +x)2,x2 +2x,3} (e) {cos(2x),sin2(x),cos2(x)} (f) {0,x,x^} 

1.22 Does the equation sm^[x)/ cos^(x) = ta.n^{x) show that this set of functions 
{sin2(x), cos2(x),tan2(x)} is a linearly dependent subset of the set of all real- valued 
functions with domain the interval {—n/2..n/2) of real numbers between —n/2 and 

7T/2)? 

1.23 Is the xy-plane subset of the vector space E-' linearly independent? 
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/ 1.24 Show that the nonzero rows of an echelon form matrix form a hnearly indepen- 
dent set. 

/ 1.25 (a) Show that if the set {u, v, w} is hnearly independent then so is the set 

{u, u + v, u + v + w}. 

(b) What is the relationship between the linear independence or dependence of 
{u, V, w} and the independence or dependence of {u — v, v — w, w — \L}? 

1.26 Example 1.8 shows that the empty set is linearly independent. 

(a) When is a one-element set linearly independent? 

(b) How about a set with two elements? 

1.27 In any vector space V, the empty set is linearly independent. What about all 
of V? 

1.28 Show that if {x, y, z} is linearly independent then so are all of its proper 
subsets: {x, y}, {x,z}, {y, z}, {x},{y}, {z}, and {}. Is that 'only if also? 

1.29 (a) Show that this 



is a linearly independent subset of ] 
(b) Show that 



is in the span of S by finding Ci and C2 giving a linear relationship. 





Show that the pair Ci , C2 is unique. 

(c) Assume that S is a subset of a vector space and that v is in [S], so that v is a 
linear combination of vectors from S. Prove that if S is linearly independent then 
a linear combination of vectors from S adding to v is unique (that is, unique up 
to reordering and adding or taking away terms of the form • s). Thus S as a 
spanning set is minimal in this strong sense: each vector in [S] is a combination 
of elements of S a minimum number of times — only once. 

(d) Prove that it can happen when S is not linearly independent that distinct 
linear combinations sum to the same vector. 

1.30 Prove that a polynomial gives rise to the zero function if and only if it is 
the zero polynomial. {Comment. This question is not a Linear Algebra matter, 
but we often use the result. A polynomial gives rise to a function in the natural 
way: x i-> Cnx'^ + ■ • • + Cix + Cq.) 

1.31 Return to Section 1.2 and redefine point, line, plane, and other linear surfaces 
to avoid degenerate cases. 

1.32 (a) Show that any set of four vectors in is linearly dependent. 

(b) Is this true for any set of five? Any set of three? 

(c) What is the most number of elements that a linearly independent subset of 

can have? 

/ 1.33 Is there a set of four vectors in , any three of which form a linearly independent 
set? 
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1.34 Must every linearly dependent set have a subset that is dependent and a subset 
that is independent? 

1.35 In R*, what is the biggest linearly independent set you can find? The smallest? 
The biggest linearly dependent set? The smallest? ('Biggest' and 'smallest' mean 
that there are no supersets or subsets with the same property.) 

/ 1.36 Linear independence and linear dependence are properties of sets. We can thus 
naturally ask how the properties of linear independence and dependence act with 
respect to the familiar elementary set relations and operations. In this body of this 
subsection we have covered the subset and superset relations. We can also consider 
the operations of intersection, complementation, and union. 

(a) How does linear independence relate to intersection: can an intersection of 
linearly independent sets be independent? Must it be? 

(b) How does linear independence relate to complementation? 

(c) Show that the union of two linearly independent sets can be linearly indepen- 



(d) Show that the union of two linearly independent sets need not be linearly 
independent. 

1.37 Continued from prior exercise. What is the interaction between the property 
of linear independence and the operation of union? 

(a) We might conjecture that the union SUT of linearly independent sets is linearly 
independent if and only if their spans have a trivial intersection [S] fl [T] — {0}. 
What is wrong with this argument for the 'if direction of that conjecture? "If 
the union S U T is linearly independent then the only solution to C] Sj + • • • + 
CnSn + d]ti + • • • + dmt,^ = is the trivial one C] =0, . . . , = 0. So any 
member of the intersection of the spans must be the zero vector because in 
C] S] + ■ ■ ■ + CnSn = dit] + ■ ■ ■ + dnit,^ each scalar is zero." 

(b) Give an example showing that the conjecture is false. 

(c) Find linearly independent sets S and T so that the union of S — {S n T) and 
T— (SnT] is linearly independent, but the union SUT is not linearly independent. 

(d) Characterize when the union of two linearly independent sets is linearly 
independent, in terms of the intersection of spans. 

/ 1.38 For Corollary 1.14, 

(a) fill in the induction for the proof; 

(b) give an alternate proof that starts with the empty set and builds a sequence 
of linearly independent subsets of the given finite set until one appears with the 
same span as the given set. 

1.39 With a some calculation we can get formulas to determine whether or not a set 
of vectors is linearly independent, 
(a) Show that this subset of 



dent. 




is linearly independent if and only if ad 
(b) Show that this subset of 



bc/0. 




is linearly independent iff aei + bf g + cdh — hf a — idb — gee ^ 0. 
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(c) When is this subset of 



(: 




hnearly independent? 

(d) This is an opinion question: for a set of four vectors from R"*, must there be a 
formula involving the sixteen entries that determines independence of the set? 
(You needn't produce such a formula, just decide if one exists.) 
/ 1.40 (a) Prove that a set of two perpendicular nonzero vectors from is linearly- 
independent when n > 1 . 

(b) What if n = 1? n = 0? 

(c) Generalize to more than two vectors. 

1.41 Consider the set of functions from the open interval { — 1..1) to R. 

(a) Show that this set is a vector space under the usual operations. 

(b) Recall the formula for the sum of an infinite geometric series: 1 +x + x^ + - • • — 
1/(1 — x) for all X e (— 1..1). Why does this not express a dependence inside of 
the set {g(x) = 1/(1 — x),fo(x) — 1,f](x) = x, f2(x) =x^,...} (in the vector space 
that we are considering)? {Hint. Review the definition of linear combination.) 

(c) Show that the set in the prior item is linearly independent. 

This shows that some vector spaces exist with linearly independent subsets that 
are infinite. 

1.42 Show that, where S is a subspace of V, if a subset T of S is linearly independent 
in S then T is also linearly independent in V. Is that 'only if? 
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III Basis and Dimension 

The prior section ends with the statement that a spanning set is minimal when it 
is hnearly independent and a linearly independent set is maximal when it spans 
the space. So the notions of minimal spanning set and maximal independent set 
coincide. In this section we will name this idea and study its properties. 



Ill.l Basis 

1.1 Definition A basis for a vector space is a sequence of vectors that is linearly 
independent and that spans the space. 

We denote a basis with angle brackets (Pi , Pi, ■ • .) because this is a sequence,* 
meaning that the order of the elements is significant. Bases are different if they 
contain the same elements but in different orders. 

(We say that a sequence is linearly independent if the multiset consisting of 
the elements of the sequence is independent. Similarly, a sequence spans the 
space if the set of the elements of the sequence spans the space.) 

1.2 Example This is a basis for M^. 



It is linearly independent 





2c, +1C2=0 ^ 

=> Cl =€2=0 

4ci+1c2=0 



C2 —2x — y and Ci — [y — x)/2 



and it spans M?. 

2ci + 1C2 = X 
4ci + 1C2 =ij 

1.3 Example This basis for 



differs from the prior one because the vectors are in a different order. The 
verification that it is a basis is just as in the prior example. 

1.4 Example The space has many bases. Another one is this. 





* More information on sequences is in the appendix. 
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The verification is easy. 



1.5 Definition For any 



/A 







> 



is the standard (or natural) basis. We denote these vectors ei , . . . , 6^. 



Calculus books refer to M^'s standard basis vectors i and f instead of ei and 62, 
and they refer to 's standard basis vectors i, f, and k instead of e 1 , 62, and 
63 . Note that ei means something different in a discussion of than it means 
in a discussion of . 



1.6 Example Consider the space { a • cos 6 + b • sin 9 | a, b e 
the real variable 9. This is a natural basis. 



i] of functions of 



(1 • cos 9 + • sin 9, • cos 9 + 1 • sin 9) = (cos 9, sin 9) 

Another, more generic, basis is (cos 9 — sin 9, 2 cos 9 + 3 sin 9) . Verification that 
these two are bases is Exercise 22. 

1.7 Example A natural basis for the vector space of cubic polynomials is 
(1 , X, x^, x^) . Two other bases for this space are (x^, 3x^, 6x, 6) and (1,1 + x, 1 + 
X + x^, 1 + X + x^ + x^) . Checking that these are linearly independent and span 
the space is easy. 

1.8 Example The trivial space {0} has only one basis, the empty one (). 

1.9 Example The space of finite-degree polynomials has a basis with infinitely 
many elements (1 , x, x^, . . .) . 

1.10 Example We have seen bases before. In the first chapter we described the 
solution set of homogeneous systems such as this one 

X + y — w = 
z + w = 

by parametrizing. 







( '\ 


1 










y + 


-1 


V V 




I V 



w I y, w G M} 



Thus the vector space of solutions is the span of a two-element set. This two- 
vector set is also linearly independent; that is easy to check. Therefore the 
solution set is a subspace of M'* with a basis comprised of the above two elements. 
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1.11 Example Parametrization helps find bases for other vector spaces, not just 
for solution sets of homogeneous systems. To find a basis for this subspace of 

M2x2 

= 0} 

+ c(^^ o) 1^''^^^^ 



The above work shows that it spans the space. Linear independence is also easy. 

Consider again Example 1.2. To verify linearly independence we looked at 
linear combinations of the set's members that total to the zero vector Ci Pi + 
C2^2 — (o)- The resulting calculation shows that such a combination is unique, 
that Ci must be and C2 must be 0. To verify that the set speins the space we 
looked at linear combinations that total to any member of the space Ci Pi +C2 pi — 
(^) . We only noted in that example that such a combination exists, that for each 
x,y there is a Ci , C2, but in fact the calculation also shows that the combination 
is unique: Ci must be (y—x)/2 and C2 must be 2x — y. 

1.12 Theorem In any vector space, a subset is a basis if and only if each vector 
in the space can be expressed as a linear combination of elements of the subset 
in a unique way. 

We consider linear combinations to be the same if they differ only in the order 
of summands or in the addition or deletion of terms of the form '0 • p'. 

Proof A sequence is a basis if and only if its vectors form a set that spans and 
that is linearly independent. And, a subset is a spanning set if and only if each 
vector in the space is a linear combination of elements of that subset in at least 
one way. Thus we need only show that a spanning subset is linearly independent 
if and only if every vector in the space is a linear combination of elements from 
the subset in at most one way. 

Consider two expressions of a vector as a linear combination of the members 
of the subset. We can rearrange the two sums and if necessary add some 
• pt terms so that the two sums combine the same P's in the same order: 
V = ci pi + C2P2 H HCnPn and v = di pi + d2p2 H h dnPn- Now 

Ci Pi + C2P2 + • • • + CnPn = di Pi + d2p2 + • • • + d^Pn 

holds if and only if 



we rewrite the condition as a = — b + 2c. 

Thus, this is a natural candidate for a basis. 

M '1 



(Cl -di)Pi +... + (Cn-dn)Pn = 
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holds. So, asserting that each coefficient in the lower equation is zero is the same 
thing as asserting that Ci — di for each i, that is, that every vector is expressible 
as a linear combination of the (3's in a unique way. QED 



1.13 Definition In a vector space with basis B the representation of v with 
respect to B is the column vector of the coefficients used to express v as a linear 
combination of the basis vectors: 



RepB (v) = 



/ci\ 

Vcn/ 



where B = (pi , . . . , pn) and v = Ci Pi + CiPi + 
coordinates of v with respect to B. 



Cnpn- The c's are the 



Definition 1.1 requires that a basis is a sequence, that the order of the 
basis elements matters, in order to make this definition possible. Without that 
requirement we couldn't write these Ci's in order. 

We will later do representations in contexts that involve more than one basis. 
To help keep straight which representation is with respect to which basis we 
shall often write the basis name as a subscript on the column vector. 

1.14 Example In 5*3, with respect to the basis B — (1,2x, 2x^,2x^), the represen- 



tation of X + is 



Repj 



/ 0\ 
1/2 
1/2 

V 0/ 



(note that the coordinates are scalars, not vectors). With respect to a different 
basis D = (1 +x, 1 — x, x + x^, x + x^), the representation 



Repj3 (x + x^ 





1 



is different. 

1.15 Remark This use of column notation and the term 'coordinates' has both a 
down side and an up side. 

The down side is that representations look like vectors from M"-, which can 
be confusing when the vector space we are working with is , especially since 
we sometimes omit the subscript base. We must then infer the intent from the 



context. For example, the phrase 'in Mr, where v 



)' refers to the plane 



vector that, when in canonical position, ends at (3,2). To find the coordinates 
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of that vector with respect to the basis 

B = 

we solve 

to get that Cl = 3 and Ca = 1 /2. Then we have this. 

RepB (v) = 





Here, although we've omitted the subscript B from the column, the fact that 
the right side is a representation is clear from the context. 

The advantage of the notation and the term 'coordinates' is that they 
generalize the familiar case: in M"^ and with respect to the standard basis £n, the 
vector starting at the origin and ending at (vi , . . . , Vn.) has this representation. 



Rep 



Our main use of representations will come in the third chapter. The definition 
appears here because the fact that every vector is a linear combination of basis 
vectors in a unique way is a crucial property of bases, and also to help make two 
points. First, we fix an order for the elements of a basis so that we can state the 
coordinates in that order. Second, for calculation of coordinates, cimong other 
things, we shall restrict our attention to spaces with bases having only finitely 
many elements. We will see that in the next subsection. 

Exercises 

/ 1.16 Decide if each is a basis for . 

•^"(i)'(;)'0' 

/ 1.17 Represent the vector with respect to the basis. 




2j' aV V 1. 

(b) + x3, D = (1,1 +x, 1 + x + x^l + x + x2 + x3) C ^3 

, £4 C R-* 



(c: 



-1 


V 1/ 
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1.18 Find a basis for the space of all quadratic polynomials. Must any such 
basis contain a polynomial of each degree: degree zero, degree one, and degree two? 

1.19 Find a basis for the solution set of this system. 

X] — 4X2 + 3X3 — X4 = 

2xi — 8x2 + 6x3 — 2x4 = 
/ 1.20 Find a basis for M2x2i the space of 2x2 matrices. 
/ 1.21 Find a basis for each. 

(a) The subspace { Q2X^ + ai x + uq | 02 — 2a] = ao} oi 

(b) The space of three-wide row vectors whose first and second components add 
to zero 

(c) This subspace of the 2x2 matrices 



1.22 Check Example 1.6. 
/ 1.23 Find the span of each set and then find a basis for that span. 

(a) {1 +x,1 +2x} in ?2 (b) {2-2x,3 + 4x2} in ^2 
/ 1.24 Find a basis for each of these subspaces of the space 7^ of cubic polynomi- 
als. 

(a) The subspace of cubic polynomials p(x) such that p(7] =0 

(b) The subspace of polynomials p{x) such that p(7) — and p(5) — 

(c) The subspace of polynomials p(x) such that p(7) = 0, p(5) = 0, and p(3) = 

(d) The space of polynomials p(x] such that p(7) = 0, p(5) = 0, p{3) — 0, 
and p(1) = 

1.25 We've seen that the result of reordering a basis can be another basis. Must it 
be? 

1.26 Can a basis contain a zero vector? 

/ 1.27 Let (pi , |32, 133) be a basis for a vector space. 

(a) Show that (ci pi , C2|32, C3I33) is a basis when Ci, 02,03 / 0. What happens 
when at least one Ci is 0? 

(b) Prove that (ai , 0.2, CC3) is a basis where Si = pi + pi. 

1.28 Find one vector v that will make each into a basis for the space. 



each of the Ci's is zero. Generalize. 
1.30 A basis contains some of the vectors from a vector space; can it contain them 



1.31 Theorem 1.12 shows that, with respect to a basis, every linear combination is 
unique. If a subset is not a basis, can linear combinations be not unique? If so, 
must they be? 

/ 1.32 A square matrix is symmetric if for all indices i and j, entry t, j equals entry 





/ 1 



,29 Where (pi , . . . , pn) is a basis, show that in this equation 

Cl Pi H h Ckpk = Cic+1 plc+1 H i- CnP, 



n 



all? 



(a) Find a basis for the vector space of symmetric 2x2 matrices. 

(b) Find a basis for the space of symmetric 3x3 matrices. 

(c) Find a basis for the space of symmetric nxn matrices. 
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/ 1.33 We can show that every basis for M.^ contains the same number of vec- 
tors. 

(a) Show that no linearly independent subset of contains more than three 
vectors. 

(b) Show that no spanning subset of E'^ contains fewer than three vectors. Hint: 
recall how to calculate the span of a set and show that this method cannot yield 
all of R3 when we apply it to fewer than three vectors. 

1.34 One of the exercises in the Subspaces subsection shows that the set 

{(y^ I x + ij+z = 1} 

is a vector space under these operations. 

/X2\ /xi+X2-l\ /x\ /rx-r + 1\ 

yi + y2 = yi+y2 r y = ry 

zi/ \Z2/ \ Z1+Z2 / \z/ \ rz / 

Find a basis. 



III. 2 Dimension 

In the prior subsection we defined the basis of a vector space and we saw that 
a space can have many different bases. So we cannot talk about "the" basis 
for a vector space. True, some vector spaces have bases that strike us as more 
natural than others, for instance, M^'s basis £2 or Ti's basis (1,x,x^). But for 
the vector space {aax^ + aix + ao | laz — ao — }, no particular basis leaps 
out at us as the natural one. We cannot, in general, associate with a space any 
single basis that best describes that space. 

We can however find something about the bases that is uniquely associated 
with the space. This subsection shows that any two bases for a space have the 
same number of elements. So with each space we can associate a number, the 
number of vectors in any of its bases. 

Before we start, we first limit our attention to spaces where at least one basis 
has only finitely many members. 

2.1 Definition A vector space is finite- dimensional if it has a basis with only 
finitely many vectors. 

One space that is not finite-dimensional is the set of polynomials with real 
coefficients Example 1.11 (this space is not spanned by any finite subset since 
that would contain a polynomial of largest degree but this space has polynomials 
of all degrees). These spaces are interesting and important, but we will focus in 
a different direction. From now on we will study only finite- dimensional vector 
spaces. We shall take the term 'vector space' to mean 'finite-dimensional vector 
space'. 
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2.2 Remark One reason for sticking to finite-dimensional spaces is so that the 
representation of a vector with respect to a basis is a finitely-tall vector and we 
can easily write it. Another reason is that the statement 'any infinite-dimensional 
vector space has a basis' is equivalent to a statement called the Axiom of Choice 
[Blass 1984] and so covering this would move us far past this book's scope. (A 
discussion of the Axiom of Choice is in the Frequently Asked Questions list for 
sci.math, and another accessible one is [Rucker].) 

To prove the main theorem we shall use a technical result, the Exchange 
Lemma. We first illustrate it with an example. 

2.3 Example Here is a basis for and a vector given as a linear combination of 
members of that basis. 



B = 





+ 0- 



'0^ 

.2; 



Two of the basis vectors have non-zero coefficients. Pick one, for instance the 
first. Replace it with the vector that we've expressed as the combination 



B = 




and the result is another basis for M^. 



2.4 Lemma (Exchange Lemma) Assume that B — (pi , . . . , pn) is a basis for a 
vector space, and that for the vector v the relationship v = Ci pi + C2P2 + ■ • • + 
CnPn has Ci 7^ 0. Then exchanging pi for v yields another basis for the space. 

Proof Call the outcome of the exchange B = (Pi , . . . , Pi_i , v, Pi+i , . . . , Pn)- 

We first show that B is linearly independent. Any relationship di pi + • • • + 
diV + • • • + dnpn = among the members of B, after substitution for v, 

dl Pi + • • • + di • (Ci Pi + • • • + CiPi + • • • + CnPn) + • • • + dnPn = (*) 

gives a linear relationship among the members of B. The basis B is linearly 
independent, so the coefficient diCi of pi is zero. Because we assumed that Ci is 
nonzero, di — 0. Using this in equation [*) above gives that all of the other d's 
are also zero. Therefore B is linearly independent. 

We finish by showing that B has the same span as B. Half of this argument, 
that [B] C [B], is easy; we can write any member di pi +• • •+diV+- • •+dnPn of [B] 
as di Pi + • ■ ■ + di • (ci pi + • • • + CnPn) + • • ■ + dnPn, which is a hnear combination 
of linear combinations of members of B, and hence is in [B]. For the [B] C [B] half 
of the argument, recall that when v = Ci Pi • --l-CnPn with Ci ^ 0, then we can 
rearrange the equation to Pi — (— Ci/Ci)Pi + • • • + (1/Ci)v -!-••• + {— Cn/Ci)pn. 
Now, consider any member di pi + ■ ■ • + diPi + • • • + dnPn of [B], substitute for 
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pt its expression as a linear combination of the members of B, and recognize, 
as in the first half of this argument, that the result is a linear combination of 
linear combinations of members of B, and hence is in [B]. QED 

2.5 Theorem In any finite-dimensional vector space, all bases have the same 
number of elements. 

Proof Fix a vector space with at least one finite basis. Choose, from among 
all of this space's bases, one B — (pi , . . . , (3n) of minimal size. We will show 
that any other basis D — (61 , 62) • • •) also has the same number of members, n. 
Because B has minimal size, D has no fewer than n vectors. We will argue that 
it cannot have more than n vectors. 

The basis B spans the space and 5] is in the space, so 5i is a nontrivial linear 
combination of elements of B. By the Exchange Lemma, we can swap 61 for a 
vector from B, resulting in a basis Bi , where one element is 61 and all of the 
n — 1 other elements are |3's. 

The prior paragraph forms the basis step for an induction argument. The 
inductive step starts with a basis Bi,^ (for 1 ^ k < n) containing k members of D 
and n — k members of B. We know that D has at least n members so there is a 
■ Represent it as a linear combination of elements of B^. The key point: in 
that representation, at least one of the nonzero scalars must be associated with 
a pi or else that representation would be a nontrivial linear relationship among 
elements of the linearly independent set D. Exchange S^+i for pi to get a new 
basis Bic+i with one 6 more and one p fewer than the previous basis B^. 

Repeat the inductive step until no p's remain, so that B^ contains 5i , . . . , 5^. 
Now, D cannot have more than these n vectors because any 6n+i that remains 
would be in the span of Bn (since it is a basis) and hence would be a linear 
combination of the other 6's, contradicting that D is linearly independent. QED 

2.6 Definition The dimension of a vector space is the number of vectors in any 
of its bases. 

2.7 Example Any basis for has n vectors since the standard basis £n has n 
vectors. Thus, this definition generalizes the most familiar use of term, that 

is n- dimensional. 

2.8 Example The space CPn of polynomials of degree at most n has dimension 
n+1 . We can show this by exhibiting any basis — (l,x, . . . ,x^) comes to mind — 
and counting its members. 

2.9 Example A trivial space is zero-dimensional since its basis is empty. 

Again, although we sometimes say 'finite-dimensional' as a reminder, in 
the rest of this book we assume that all vector spaces are finite-dimensional. 
An instance of this is that in the next result the word 'space' means 'finite- 
dimensional vector space'. 
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2.10 Corollary No linearly independent set can have a size greater than the 
dimension of the enclosing space. 

Proof The proof of Theorem 2.5 never uses that D spans the space, only that 
it is linearly independent. QED 

2.11 Example Recall the subspace diagram from the prior section showing the 
subspaces of R^. Each subspace shown is described with a minimal spanning set, 
for which we now have the term 'basis'. The whole space has a basis with three 
members, the plane subspaces have bases with two members, the line subspaces 
have bases with one member, and the trivial subspace has a basis with zero 
members. When we saw that diagram we could not show that these are M^'s 
only subspaces. We can show it now. The prior corollary proves that the only 
subspaces of are either three-, two-, one-, or zero-dimensional. Therefore, the 
diagram indicates all of the subspaces. There are no subspaces somehow, say, 
between lines and planes. 

2.12 Corollary Any linearly independent set can be expanded to make a basis. 

Proof If a linearly independent set is not already a basis then it must not span 
the space. Adding to the set a vector that is not in the span will preserve linear 
independence. Keep adding until the resulting set does span the space, which 
the prior corollary shows will happen after only a finite number of steps. QED 

2.13 Corollary Any spanning set can be shrunk to a basis. 

Proof Call the spanning set S. If S is empty then it is already a basis (the 
space must be a trivial space). If S = {0} then it can be shrunk to the empty 
basis, thereby making it linearly independent, without changing its span. 

Otherwise, S contains a vector si with s i ^ and we can form a basis 
Bi = (s*!). If [Bi] = [S] then we are done. If not then there is a S2 G [S] such 
that S2 ^ [BiJ. Let B2 — (si , S2); if [B2] — [S] then we are done. 

We can repeat this process until the spans are equal, which must happen in 
at most finitely many steps. QED 

2.14 Corollary In an n-dimensional space, a set composed of n vectors is linearly 
independent if and only if it spans the space. 

Proof First we will show that a subset with n vectors is linearly independent if 
and only if it is a basis. The 'if is trivially true — bases are linearly independent. 
'Only if holds because a linearly independent set can be expanded to a basis, 
but a basis has n elements, so this expansion is actually the set that we began 
with. 

To finish, we will show that any subset with n vectors spans the space if and 
only if it is a basis. Again, 'if is trivial. 'Only if holds because any spanning 
set can be shrunk to a basis, but a basis has n elements and so this shrunken 
set is just the one we started with. QED 
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The main result of this subsection, that all of the bases in a finite-dimensional 
vector space have the same number of elements, is the single most important 
result in this book because, as Example 2.11 shows, it describes what vector 
spaces and subspaces there can be. We will see more in the next chapter. 

One immediate consequence brings us back to when we considered the two 
things that could be meant by the term 'minimal spanning set'. At that point we 
defined 'minimal' as linearly independent but we noted that another reasonable 
interpretation of the term is that a spanning set is 'minimal' when it has the 
fewest number of elements of any set with the same span. Now that we have 
shown that all bases have the same number of elements, we know that the two 
senses of 'minimal' are equivalent. 

Exercises 

Assume that all spaces are finite- dimensional unless otherwise stated. 
/ 2.15 Find a basis for, and the dimension of, Vz- 

2.16 Find a basis for, and the dimension of, the solution set of this system. 

X] — 4X2 + 3X3 — X4 = 

2xi — 8x2 + 6x3 — 2x4 = 
/ 2.17 Find a basis for, and the dimension of, M2X2, the vector space of 2x2 matrices. 
2.18 Find the dimension of the vector space of matrices 

(:3 

subject to each condition. 

(a) a, b,c, d e K 

(b) a - b + 2c = and d e R 

(c) a + b + c = 0, Q + b- c = 0, and d e K 
/ 2.19 Find the dimension of each. 

(a) The space of cubic polynomials p{x) such that p(7) — 

(b) The space of cubic polynomials p(x) such that p(7) =0 and p(5) — 

(c) The space of cubic polynomials p(x) such that p(7) — 0, p(5) = 0, and p(3) = 

(d) The space of cubic polynomials p{x) such that p(7] — 0, p(5] — 0, p(3) = 0, 
and p(1) = 

2.20 What is the dimension of the span of the set {cos^ 9, sin^ 9, cos 29, sin29}? This 
span is a subspace of the space of all real- valued functions of one real variable. 

2.21 Find the dimension of C*^ , the vector space of 47-tuples of complex numbers. 

2.22 What is the dimension of the vector space M3X5 of 3x5 matrices? 
/ 2.23 Show that this is a basis for R"*. 






















1 




1 




1 












1 




1 














^1/ 



(We can use the results of this subsection to simplify this job.) 
2.24 Refer to Example 2.11. 

(a) Sketch a similar subspace diagram for ^2. 

(b) Sketch one for Mzxi- 
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/ 2.25 Where S is a set, the functions f : S ^ M form a vector space under the natural 
operations: the sum f + g is the function given by f + g (s) = f (s) + g{s) and the 
scalar product is r • f (s) = r • f (s). What is the dimension of the space resulting for 
each domain? 

(a)S={l} (b)S={l,2} (c) S={1,...,n} 

2.26 (See Exercise 25.) Prove that this is an infinite-dimensional space: the set of 
all functions f : R ^> R under the natural operations. 

2.27 (See Exercise 25.) What is the dimension of the vector space of functions 
f : S ^ R, under the natural operations, where the domain S is the empty set? 

2.28 Show that any set of four vectors in R2 is linearly dependent. 

2.29 Show that (ai , 0-2, aa) C R^ is a basis if and only if there is no plane through 
the origin containing all three vectors. 

2.30 (a) Prove that any subspace of a finite dimensional space has a basis. 

(b) Prove that any subspace of a finite dimensional space is finite dimensional. 

2.31 Where is the finiteness of B used in Theorem 2.5? 

/ 2.32 Prove that if U and W are both three-dimensional subspaces of R^ then U n W 
is non-trivial. Generalize. 

2.33 A basis for a space consists of elements of that space. So we are naturally led to 
how the property 'is a basis' interacts with operations C and n and U. (Of course, 
a basis is actually a sequence in that it is ordered, but there is a natural extension 
of these operations.) 

(a) Consider first how bases might be related by C. Assume that U, W are 
subspaces of some vector space and that U C W. Can there exist bases Bu for U 
and Bw for W such that Bu Q Bw? Must such bases exist? 

For any basis Bu for U, must there be a basis Bw for W such that Bu C Bw? 
For any basis Bw for W, must there be a basis Bu for U such that Bu Q Bw? 
For any bases Bu, Bw for U and W, must Bu be a subset of Bw? 

(b) Is the n of bases a basis? For what space? 

(c) Is the U of bases a basis? For what space? 

(d) What about the complement operation? 

(Hint. Test any conjectures against some subspaces of R^.) 

/ 2.34 Consider how 'dimension' interacts with 'subset'. Assume U and W are both 
subspaces of some vector space, and that U C W. 

(a) Prove that dim(U) dim(W). 

(b) Prove that equality of dimension holds if and only if U = W. 

(c) Show that the prior item does not hold if they are infinite-dimensional. 

? 2.35 [Wohascum no. 47] For any vector v in R'^ and any permutation ff of the 
numbers 1,2, . . . , n (that is, cr is a rearrangement of those numbers into a new 
order), define cr(v) to be the vector whose components are Va-(i), Vu(2), . . . , and 
V(j(n) (where cr(1] is the first number in the rearrangement, etc.). Now fix v and 
let V be the span of {cr(v] | a permutes 1 , . . . , n}. What are the possibilities for 
the dimension of V? 
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III. 3 Vector Spaces and Linear Systems 

We will now reconsider linear systems and Gauss's Method, aided by the tools 
and terms of this chapter. We will make three points. 

For the first point, recall the insight from the first chapter that if two 
matrices are related by row operations A — > ■ ■ ■ — > B then each row of B is a 
linear combination of the rows of A. That is. Gauss's Method works by taking 
linear combinations of rows. Therefore, the right setting in which to study row 
operations in general, and Gauss's Method in particular, is the following vector 
space. 

3.1 Definition The row space of a matrix is the span of the set of its rows. The 
row rank is the dimension of the row space, the number of linearly independent 
rows. 

3.2 Example If 




then Rowspace(A) is this subspace of the space of two-component row vectors. 

{ci-(2 3) + C2-(4 6)|c,,C2eM} 

The second is linearly dependent on the first and so we can simplify this 
description to {c • (2 3) | c G M}. 

3.3 Lemma If two matrices A and B are related by a row operation 

A — > B or A — > B or A — > B 

(for i ^ j and k ^ 0) then their row spaces are equal. Hence, row-equivalent 
matrices have the same row space and therefore the same row rank. 

Proof Corollary One. III. 2. 4 shows that when A — > B then each row of B is a 
linear combination of the rows of A. That is, in the above terminology, each row 
of B is an element of the row space of A. Then Rowspace(B) C Rowspace(A) 
follows because a member of the set Rowspace(B) is a linear combination of the 
rows of B, so it is a combination of combinations of the rows of A, and so by 
the Linear Combination Lemma is also a member of Rowspace(A). 

For the other set containment, recall Lemma One. III. 1.5, that row operations 
are reversible, that A — > B if and only if B — > A. Then Rowspace(A) C 
Rowspace(B) follows as in the previous paragraph. QED 

Thus, row operations leave the row space unchanged. But of course. Gauss's 
Method performs the row operations systematically, with the goal of echelon 
form. 
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3.4 Lemma The nonzero rows of an echelon form matrix make up a linearly 
independent set. 

Proof Lemma One. III. 2. 5 says that no nonzero row of an echelon form matrix is 
a linear combination of the other rows. This restates that result in this chapter's 
terminology. QED 

Thus, in the language of this chapter, Gaussian reduction works by eliminating 
linear dependences among rows, leaving the span unchanged, until no nontrivial 
linear relationships remain among the nonzero rows. In short. Gauss's Method 
produces a basis for the row space. 

3.5 Example Prom any matrix, we can produce a basis for the row space by 
performing Gauss's Method and taking the nonzero rows of the resulting echelon 
form matrix. For instance, 

I 1 4 1 I -'-^'^ ''-^'^ |0 1 0| 
V2 5) [o 3) 

produces the basis ((1 3 1], (0 1 0), (0 3)) for the row space. This is 
a basis for the row space of both the starting and ending matrices, since the two 
row spaces are equal. 

Using this technique, we can also find bases for spans not directly involving 
row vectors. 

3.6 Definition The column space of a matrix is the span of the set of its columns. 
The column rank is the dimension of the column space, the number of linearly 
independent columns. 

Our interest in column spaces stems from our study of linear systems. An 
example is that this system 

Cl + 3c2 + 7c3 = di 
2ci + 3c2 + 8c3 — di 
02 + 2C3 = ds 
4ci + 4c3 = d4 

has a solution if and only if the vector of d's is a linear combination of the other 
column vectors. 











(') 




/dA 


2 


+ C2 


3 


+ C3 


8 




d2 





1 


2 





















meaning that the vector of d's is in the column space of the matrix of coefficients. 



Section III. Basis and Dimension 



123 



3.7 Example Given this matrix, 



/I 


3 


7\ 


2 


3 


8 





1 


2 


^4 





4/ 



to get a basis for the column space, temporarily turn the columns into rows and 
reduce. 



/l 2 











2 





3 3 


1 


i) 


-3pi +P2 -2p2 + P3 
-7pi +P3 





-3 


1 


V 8 


2 














Now turn the rows back to columns. 







( '\ 


2 




-3 







1 


w 







The result is a basis for the column space of the given matrix. 

3.8 Definition The transpose of a matrix is the result of interchanging the rows 
and columns of that matrix, so that column j of the matrix A is row j of A^, 
and vice versa. 

So we can summarize the prior example as "transpose, reduce, and transpose 
back." 

We can even, at the price of tolerating the as-yet-vague idea of vector spaces 
being "the same," use Gauss's Method to find bases for spans in other types of 
vector spaces. 

3.9 Example To get a basis for the span of {x^ + x^, 2x^ + 3x'*, — x^ — 3x^} in 



the space CP4, 
tors (0 1 
Method 



think of these three polynomials as "the same" as the row vec- 
1),(0 2 3),and(0 0-10 -3), apply Gauss's 




(0 





1 




































-2pi +P2 2 p2 + P3 
Pi +P3 



and translate back to get the basis (x^ + x^,x^). (As mentioned earlier, we will 
make the phrase "the same" precise at the start of the next chapter.) 

Thus, our first point in this subsection is that the tools of this chapter give 
us a more conceptual understanding of Gaussian reduction. 

For the second point of this subsection, observe that row operations on a 
matrix can change its column space. 
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The column space of the left-hand matrix contains vectors with a second compo- 
nent that is nonzero but the column space of the right-hand matrix is different 
because it contains only vectors whose second component is zero. It is this 
observation that makes next result surprising. 

3.10 Lemma Row operations do not change the column rank. 

Proof Restated, if A reduces to B then the column rank of B equals the column 
rank of A. 

We will be done if we can show that row operations do not affect linear 
relationships among columns because the column rank is just the size of the 
largest set of unrelated columns. That is, we will show that a relationship exists 
among columns (such as that the fifth column is twice the second plus the 
fourth) if and only if that relationship exists after the row operation. But this 
is exactly the first theorem of this book, Theorem One. 1. 1.5: in a relationship 
among columns, 



H hCr 



£12,1 



Cl 



0.2, 



\ 



row operations leave unchanged the set of solutions (ci , . . . , Cn). QED 

Another way, besides the prior result, to state that Gauss's Method has 
something to say about the column space as well as about the row space is with 
Gauss- Jordan reduction. Recall that it ends with the reduced echelon form of a 
matrix, as here. 




(^ 


3 





2 








1 















Consider the row space and the column space of this result. Our first point made 
above says that a basis for the row space is easy to get: simply collect together all 
of the rows with leading entries. However, because this is a reduced echelon form 
matrix, a basis for the column space is just as easy: take the columns containing 
the leading entries, that is, (ei , ez). (Linear independence is obvious. The other 
columns are in the span of this set, since they all have a third component of 
zero.) Thus, for a reduced echelon form matrix, we can find bases for the row 
and column spaces in essentially the same way: by taking the parts of the matrix, 
the rows or columns, containing the leading entries. 



3.11 Theorem The row rank and column rank of a matrix are equal. 



Proof Bring the matrix to reduced echelon form. At that point, the row rank 
equals the number of leading entries since that equals the number of nonzero 
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rows. Also at that point, the number of leading entries equals the column rank 
because the set of columns containing leading entries consists of some of the ei^'s 
from a standard basis, and that set is linearly independent and spans the set of 
columns. Hence, in the reduced echelon form matrix, the row rank equals the 
column rank, because each equals the number of leading entries. 

But Lemma 3.3 and Lemma 3.10 show that the row rank and column rank 
are not changed by using row operations to get to reduced echelon form. Thus 
the row rank and the column rank of the original matrix are also equal. QED 

3.12 Definition The rank of a matrix is its row rank or column rank. 

So our second point in this subsection is that the column space and row 
space of a matrix have the same dimension. Our third and final point is that 
the concepts that we've seen arising naturally in the study of vector spaces are 
exactly the ones that we have studied with linear systems. 

3.13 Theorem For linear systems with n unknowns and with matrix of coefficients 
A, the statements 

(1) the rank of A is r 

(2) the space of solutions of the associated homogeneous system has dimension 
n — r 

are equivalent. 

So if the system has at least one particular solution then for the set of solutions, 
the number of parameters equals n — r, the number of variables minus the rank 
of the matrix of coefficients. 

Proof The rank of A is r if and only if Gaussian reduction on A ends with r 
nonzero rows. That's true if and only if echelon form matrices row equivalent 
to A have r-many leading variables. That in turn holds if and only if there are 
n — r free variables. QED 

3.14 Remark [Munkres] Sometimes that result is mistakenly remembered to say 
that the general solution of an n unknown system of m equations uses n — m 
parameters. The number of equations is not the relevant figure, rather, what 
matters is the number of independent equations (the number of equations in 
a maximal independent set). Where there are r independent equations, the 
general solution involves n — r parameters. 

3.15 Corollary Where the matrix A is nxn, the statements 

(1) the rank of A is n 

(2) A is nonsingular 

(3) the rows of A form a linearly independent set 

(4) the columns of A form a linearly independent set 

(5) any linear system whose matrix of coeSicients is A has one and only one 
solution 

are equivalent. 
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Proof Clearly (1) (2) <^ (3) <^ (4). The last, (4) (5), holds 

because a set of n column vectors is linearly independent if and only if it is a 
basis for R^, but the system 



Cl 



+ 



+ Cr 



has a unique solution for all choices of di , . . . , dn G M if and only if the vectors 



of a's form a basis. 



QED 



Exercises 

3.16 Transpose each. 



(a) 



(b) 



(c) 



(d) 



(e) (-1 -2] 

/ 3.17 Decide if the vector is in the row space of the matrix. 

/ 1 3\ 



(a) 



, (1 0) (b) 1-1 1 1, (1 1 1) 



/ 3.18 Decide if the vector is in the column space. 



(a) 



1 1 
1 1 



(b) 




/ 3.19 Decide if the vector is in the column space of the matrix, 

/ 1 



(a) 



(b) 



4 

2 -4 



(c) 



1 

1 1 

-1 -1 



/ 3.20 Find a basis for the row space of this matrix. 



(1 

1 

3 1 

M 



4\ 

1 

2 

-1/ 




/ 3.21 Find the rank of each matrix. 
/2 1 3^ 
(a) h -1 2 

\i 3y 
/o o\ 

(d) 

\0 0/ 

/ 3.22 Find a basis for the span of each set. 

(a){(1 3),M 3),(1 4), (2 l)}CM,x2 




X, 1 -x^,3 + 2x-x2}C?3 
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3.23 Which matrices have rank zero? Rank one? 
/ 3.24 Given a, b, c e R, what choice of d will cause this matrix to have the rank of 
one? 

b 

,c d 

3.25 Find the column rank of this matrix. 

'13-1504 




y2 1 4 1, 
3.26 Show that a linear system with at least one solution has at most one solution if 
and only if the matrix of coefficients has rank equal to the number of its columns. 
/ 3.27 If a matrix is 5x9, which set must be dependent, its set of rows or its set of 
columns? 

3.28 Give an example to show that, despite that they have the same dimension, the 
row space and column space of a matrix need not be equal. Are they ever equal? 

3.29 Show that the set { (1 , -1 , 2, -3), (1 , 1 , 2, 0), (3, -1 , 6, -6)} does not have the 
same span as { (1 , 0, 1 , 0), (0, 2, 0, 3] }. What, by the way, is the vector space? 

/ 3.30 Show that this set of column vectors 

3x + 2y + 4z = d, 
there are x, y, and z such that: x — z = d2} 

2x + 2y + 5z = ds 
is a subspace of M? . Find a basis. 
3.31 Show that the transpose operation is linear: 

(rA + sB)"^ =rA"^ + sB"^ 
for r, s e R and A, B e Mmxn- 
/ 3.32 In this subsection we have shown that Gaussian reduction finds a basis for the 
row space. 

(a) Show that this basis is not unique — different reductions may yield different 
bases. 

(b) Produce matrices with equal row spaces but unequal numbers of rows. 

(c) Prove that two matrices have equal row spaces if and only if after Gauss- Jordan 
reduction they have the same nonzero rows. 

3.33 Why is there not a problem with Remark 3.14 in the case that r is bigger than 
n? 

3.34 Show that the row rank of an mxn matrix is at most m. Is there a better 
bound? 

/ 3.35 Show that the rank of a matrix equals the rank of its transpose. 

3.36 True or false: the column space of a matrix equals the row space of its transpose. 
/ 3.37 We have seen that a row operation may change the column space. Must it? 

3.38 Prove that a linear system has a solution if and only if that system's matrix of 
coefficients has the same rank as its augmented matrix. 

3.39 An mxn matrix has full row rank if its row rank is m, and it has full column 
rank if its column rank is n. 

(a) Show that a matrix can have both full row rank and full column rank only if 
it is square. 

(b) Prove that the linear system with matrix of coefficients A has a solution for 
any di , . . . , dn's on the right side if and only if A has full row rank. 
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(c) Prove that a homogeneous system has a unique solution if and only if its 
matrix of coefficients A has full column rank. 

(d) Prove that the statement "if a system with matrix of coefficients A has any 
solution then it has a unique solution" holds if and only if A has full column 
rank. 

3.40 How would the conclusion of Lemma 3.3 change if Gauss's Method were changed 
to allow multiplying a row by zero? 

/ 3.41 What is the relationship between rank(A) and rank(— A)? Between rank(A) 
and rank(kA)? What, if any, is the relationship between rank{A), rank(B), and 
rank(A + B)? 



III. 4 Combining Subspaces 

This subsection is optional. It is required only for the last sections of 
Chapter Three and Chapter Five and for occasional exercises, and can be 
passed over without loss of continuity. 

One way to understand something is to see how to build it from component 
parts. For instance, we sometimes think of as in some way put together 
from the x-axis, the y-axis, and z-axis. In this subsection we will describe how 
to decompose a vector space into a combination of some of its subspaces. In 
developing this idea of subspace combination, we will keep the example in 
mind as a prototype. 

Subspaces are subsets and sets combine via union. But taking the combination 
operation for subspaces to be the simple union operation isn't what we want. 
For instance, the union of the x-axis, the y-axis, and z-axis is not all of and 
in fact this union of subspaces is not a subspace because it is not closed under 
addition: 

/I 



is in none of the three axes and hence is not in the union. Therefore, in addition 
to the members of the subspaces we must at least also include all of the linear 
combinations. 

4.1 Definition Where Wi , . . . , are subspaces of a vector space, their sum is 
the span of their union Wi + Wi H h = [Wi U W2 U • • • W^]. 

Writing '+' here fits with the practice of using this symbol for a natural accu- 
mulation operation. 

4.2 Example The prototype works with this. Any vector w e M-' is a linear 
combination Ci vi + C2V2 + C3V3 where V] is a member of the x-axis, etc., in this 
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way 











= 1 • 













+ 1 • 




and so M — x-axis + y-axis + z-axis. 

4.3 Example A sum of subspaces can be less than the entire space. Inside of T'4, 
let L be the subspace of linear polynomials {Q + bx| a, bsM} and let C be the 
subspace of purely-cubic polynomials {cx^ | c G E}. Then L + C is not all of T4. 
Instead, L + C= {a + bx + cx^ | a, b, c e M}. 

4.4 Example A space can be described as a combination of subspaces in more 
than one way. Besides the decomposition = x-axis -|- y-axis + z-axis, we can 



also write 



p3 _ 



xy-plane + 'L)z-plane. To check this, note that any w € M can 



be written as a linear combination of a member of the xy-plane and a member 
of the yz-plane; here are two such combinations. 



= 1 • 




+ 1 





+ 1 • 



( ^ 

W2/2 

\ W3 



The above definition gives one way in which we can think of a space as a 
combination of some of its parts. However, the prior example shows that there is 
at least one interesting property of our benchmark model that is not captured by 
the definition of the sum of subspaces. In the familiar decomposition of M^, we 
often speak of a vector's 'x part' or 'y part' or 'z part'. That is, in our prototype 
each vector has a unique decomposition into parts that come from the parts 
making up the whole space. But in the decomposition used in Example 4.4, we 
cannot refer to the "xy part" of a vector — these three sums 





all describe the vector as comprised of something from the first plane plus 
something from the second plane, but the "xy part" is different in each. 

That is, when we consider how is put together from the three axes we 
might mean "in such a way that every vector has at least one decomposition," 
and that leads to the definition above. But if we take it to mean "in such a way 
that every vector has one and only one decomposition" then we need another 
condition on combinations. To see what this condition is, recall that vectors are 
uniquely represented in terms of a basis. We can use this to break a space into a 
sum of subspaces such that any vector in the space breaks uniquely into a sum 
of members of those subspaces. 

4.5 Example Consider with its standard basis £3 — (ei , 62, 63). The subspace 
with the basis Bi = (ei) is the x-axis. The subspace with the basis B2 — (ei) is 
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the ij-axis. The subspace with the basis B3 — (63) is the z-axis. The fact that 
any member of is expressible as a sum of vectors from these subspaces 




is a reflection of the fact that £3 spans the space - 




this equation 




has a solution for any x,y,z E M. And, the fact that each such expression is 
unique reflects that fact that £3 is linearly independent — any equation like the 
one above has a unique solution. 

4.6 Example We don't have to take the basis vectors one at a time, the same 
idea works if we conglomerate them into larger sequences. Consider again the 
space and the vectors from the standard basis £3. The subspace with the 
basis Bl — (61,63) is the xz- plane. The subspace with the basis B2 — (ez) is 
the y-axis. As in the prior example, the fact that any member of the space is a 
sum of members of the two subspaces in one and only one way 




is a reflection of the fact that these vectors form a basis - 

y 



-this system 



ici 



+ C3 




has one and only one solution for any x, y, z e M. 

These examples illustrate a natural way to decompose a space into a sum of 
subspaces in such a way that each vector decomposes uniquely into a sum of 
vectors from the parts. 

4.7 Definition The concatenation of the sequences Bi = (Pij , . . . , pi ) , • • ■ 1 

Bk — (Pk,i ) • • • ) Pk.rik) adjoins them into a single sequence. 



Bl B, 



Bk = (|3l,l , . . . , Pl,ni ) |32,1, • • • , |3k,Tik) 



4.8 Lemma Let V be a vector space that is the sum of some of its subspaces 
V — Wi + • • • + Wk. Let Bl , . . . , Bk be bases for these subspaces. The following 
are equivalent. 



Section III. Basis and Dimension 



131 



(1) The expression of any v G V as a combination v = wi + • • • + with 
Wi e Wi is unique. 

(2) The concatenation Bi • • • B^ is a basis for V. 

(3) The nonzero members of {wi , . . . , Wk}, with Wt e Wi, form a linearly 
independent set. 

Proof We will show that (1) =^ (2), that (2) =^ (3), and finally that 
(3) (1). For these arguments, observe that we can pass from a combination 
of w's to a combination of (3's 

diWi H h dkWk 

= dl (Ci,i |3i,i H h Ci,ri, |3l,Tii ) H 1- dic(Ck,l |3k,l H h Ck,nkPk,nk] 

= dicij • (3i,i H h dkCk.Tik ■ Pk.Tik 

(*) 

and vice versa (we can move from the bottom to the top by taking each di to 
be 1). 

For (1) (2), assume that all decompositions are unique. We will show 
that B] • • • Bk spans the space and is linearly independent. It spans the 
space because the assumption that V — Wi + • • • + Wk means that every v 
can be expressed as v = wi + • • • + Wk, which translates by equation (*) to an 
expression of v as a linear combination of the p's from the concatenation. For 
linear independence, consider this linear relationship. 

= Ci J (3i,i H h Ck,nkPk,nk 

Regroup as in (*) (that is, move from bottom to top) to get the decomposition 
= Wi + • • • + Wk. Because the zero vector obviously has the decomposition 
= + • • • + 0, the assumption that decompositions are unique shows that each 
Wi is the zero vector. This means that Cij Pij + • • • + Ci^niPi.Tii — 0- Thus, 
since each B^ is a basis, we have the desired conclusion that all of the c's are 
zero. 

For (2) (3), assume that Bi • • • Bk is a basis for the space. Consider 
a linear relationship among nonzero vectors from different Wi's, 

= • • • + diWi + • • • 

in order to show that it is trivial. (The relationship is written in this way 
because we are considering a combination of nonzero vectors from only some 
of the Wi's; for instance, there might not be a W] in this combination.) As in 

(*), = + di(Ci,i(3i,i + ••• + Ci,ni|3i,nJ + ■•• = ••• + diCi,i • |3i,i + ••• + 

diCi^rii • Pi,Tii + • • ■ and the linear independence of Bi • • • Bk gives that each 
coefficient diCij is zero. Now, Wi is a nonzero vector, so at least one of the Ci j 's 
is not zero, and thus di is zero. This holds for each di, and therefore the linear 
relationship is trivial. 
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Finally, for (3) (1), assume that, among nonzero vectors from different 
Wi's, any linear relationship is trivial. Consider two decompositions of a vector 
V = W] + ■ • • + W]c and v = ui + ■ ■ • + Uk in order to show that the two are the 
same. We have 

= (wi + h Wk] - (ui + h Uk] = (w, - ui ) H h (wk - Uk) 

which violates the assumption unless each wt — Ui is the zero vector. Hence, 
decompositions are unique. QED 

4.9 Definition A collection of subspaces {Wi , . . . , Wk} is independent if no 
nonzero vector from any Wt is a linear combination of vectors from the other 
subspaces W] , . . . , Wi_i , Wi+i , . . . , Wk. 



4.10 Definition A vector space V is the direct sum (or internal direct sum) 
of its subspaces Wi , . . . , Wk if V = Wi + W2 + • • • + Wk and the collection 
{ Wi , . . . , Wk} is independent. We write V = Wi W2 ® • • • ® Wk. 

4.11 Example Our prototype works: = x-axis ij-axis z-axis. 

4.12 Example The space of 2x2 matrices is this direct sum. 

It is the direct sum of subspaces in many other ways as well; direct sum 
decompositions are not unique. 

4.13 Corollary The dimension of a direct sum is the sum of the dimensions of its 
summands. 

Proof In Lemma 4.8, the number of basis vectors in the concatenation equals the 
sum of the number of vectors in the sub-bases that make up the concatenation. 
QED 

The special case of two subspaces is worth mentioning separately. 

4.14 Definition When a vector space is the direct sum of two of its subspaces 
then they are complements. 



4.15 Lemma A vector space V is the direct sum of two of its subspaces Wi and 
W2 if and only if it is the sum of the two V = Wi + W2 and their intersection 
is trivial W, n W2 ={0}. 

Proof Suppose first that V — W] W2. By definition, V is the sum of the two. 
To show that they have a trivial intersection, let v be a vector from Wi n W2 
and consider the equation v = v. On the left side of that equation is a member 
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of Wi , and on the right side is a member of W2 , which we can think of as a 
linear combination of members (of only one member) of W2. But the spaces are 
independent so the only way a member of W] can be a linear combination of 
members of W2 is if it is the zero vector v = 0. 

For the other direction, suppose that V is the sum of two spaces with a trivial 
intersection. To show that V is a direct sum of the two, we need only show that 
the spaces are independent — no nonzero member of the first is expressible as a 
linear combination of members of the second, and vice versa. This is true because 
any relationship wi — C] W2,i + • • • + CkW2^ic (with Wi e Wi and W2j G W2 for 
all j) shows that the vector on the left is also in W2, since the right side is a 
combination of members of W2. The intersection of these two spaces is trivial, 
so wi =0. The same argument works for any wz- QED 

4.16 Example In the space M^, the x-axis and the ij-axis are complements, that 
is, = X-axis©!) -axis. A space can have more than one pair of complementary 
subspaces; another pair here are the subspaces consisting of the lines y —x and 
V = 2x. 

4.17 Example In the space F = { a cos 9 + b sin 9 | a, b € M}, the subspaces Wi — 
{a cos 9 I Q e M} and VV2 = {bsin9 | b e M} are complements. In addition to 
the fact that a space like F can have more than one pair of complementary 
subspaces, inside of the space a single subspace like Wi can have more than one 
complement — another complement of Wi is W3 = {b sin 9 + b cos 9 | b G M}. 

4.18 Example In M^, the xy-plane and the yz-planes are not complements, which 
is the point of the discussion following Example 4.4. One complement of the 
xy-plane is the z-axis. A complement of the yz-plane is the line through (1,1,1). 

Following Lemma 4.15, here is a natural question: is the simple sum V — 
Wi + • • ■ + Wk also a direct sum if and only if the intersection of the subspaces 
is trivial? 

4.19 Example If there are more than two subspaces then having a trivial inter- 
section is not enough to guarantee unique decomposition (i.e., is not enough to 
ensure that the spaces are independent). In M^, let Wi be the x-axis, let W2 be 
the y-axis, and let W3 be this. 



q,r e M} 



The check that = Wi -h W2 + W3 is easy. The intersection Wi n W2 n W3 is 
trivial, but decompositions aren't unique. 




(x\ /o\ / 



X ' 



y I = I I + I y-x I + I X 




(This example also shows that this requirement is also not enough: that all 
pairwise intersections of the subspaces be trivial. See Exercise 30.) 



134 



Chapter Two. Vector Spaces 



In this subsection we have seen two ways to regard a space as built up from 
component parts. Both are useful; in particular we will use the direct sum 
definition to do the Jordan Form construction at the end of the fifth chapter. 

Exercises 

/ 4.20 Decide if is the direct sum of each W, and W2. 

(a) W, ={(^^ |xeE},W2={Q |xeR} 

(b) Wi = { Q I s e R}, W2 = { (^^ J I s e R} 

(c) W, =R2, W2 ={0} 

(d) Wi =W2 = {Q I teR} 

/ 4.21 Show that R^ is the direct sum of the xy-plane with each of these, 
(a) the z-axis 



(b) the line 



{ ( z I I z e R} 



\z/ 

4.22 Is ^2 the direct sum of {a + bx^ | Q,b e R} and {cx | c e R}? 
/ 4.23 In CPn, the even polynomials are the members of this set 
£ = {p e I p(-x) = p(x) for all x} 
and the odd polynomials are the members of this set. 

= {p e ?n I p(-x) = -p(x) for all x} 
Show that these are complementary subspaces. 

4.24 Which of these subspaces of R^ 

Wi : the X-axis, W2: the y-axis, W3: the z-axis, 
W4: the plane x + y + z = 0, W5: the yz- plane 
can be combined to 

(a) sum to R3? (b) direct sum to R^? 
/ 4.25 Show that ={qo | ao £ R}e ...e{anx" | an £ R}. 

4.26 What is W, + W2 if W, C W2? 

4.27 Does Example 4.5 generalize? That is, is this true or false: if a vector space V 
has a basis (pi , . . . , (3^) then it is the direct sum of the spans of the one-dimensional 
subspaces V = [{ Pi }] © . . . [{ Pn }]? 

4.28 Can R** be decomposed as a direct sum in two different ways? Can R' ? 

4.29 This exercise makes the notation of writing ' + ' between sets more natural. 
Prove that, where W] , . . . , are subspaces of a vector space, 

Wi H h Wk ={wi +W2 H hWk I wi e Wi,...,wic e Wk}, 

and so the sum of subspaces is the subspace of all sums. 

4.30 (Refer to Example 4.19. This exercise shows that the requirement that pairwise 
intersections be trivial is genuinely stronger than the requirement only that the 
intersection of all of the subspaces be trivial.) Give a vector space and three 
subspaces Wi , W2, and W3 such that the space is the sum of the subspaces, 
the intersection of all three subspaces W] n W2 fl W3 is trivial, but the pairwise 
intersections W, n W2, W, n W3, and W2 n W3 are nontrivial. 
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/ 4.31 Prove that if V = Wi e . . . then Wt n Wj is trivial whenever i^j. This 
shows that the first half of the proof of Lemma 4.15 extends to the case of more 
than two subspaces. (Example 4.19 shows that this implication does not reverse; 
the other half does not extend.) 
4.32 Recall that no linearly independent set contains the zero vector. Can an 
independent set of subspaces contain the trivial subspace? 

/ 4.33 Does every subspace have a complement? 

/ 4.34 Let Wi , W2 be subspaces of a vector space. 

(a) Assume that the set S] spans Wi , and that the set S2 spans W2. Can S] U S2 
span Wi + W2? Must it? 

(b) Assume that S] is a linearly independent subset of Wi and that S2 is a linearly 
independent subset of W2. Can Si U S2 be a linearly independent subset of 
W, + W2? Must it? 

4.35 When we decompose a vector space as a direct sum, the dimensions of the 
subspaces add to the dimension of the space. The situation with a space that is 
given as the sum of its subspaces is not as simple. This exercise considers the 
two-subspace special case. 

(a) For these subspaces of M2X2 find W, n W2, dim(Wi n W2), W^ + W2, and 
dim(Wi + W2). 

W, °)|c,deR} W2={(^° Jj|b,ceR} 

(b) Suppose that U and W are subspaces of a vector space. Suppose that the 
sequence (pi , . . . , |3]<) is a basis for U n W. Finally, suppose that the prior 
sequence has been expanded to give a sequence (|3] , . . . , flj , (3] , . . . , p]^) that is a 
basis for U, and a sequence (pj , . . . , p^, cDi , . . . , cDp) that is a basis for W. Prove 
that this sequence 

(Pi, Pi, ■■■,Pk,u3,,...,cDp) 

is a basis for the sum U + W. 

(c) Conclude that dim(U + W) = dim(U] + dim{W) - dim(U n W). 

(d) Let W] and W2 be eight-dimensional subspaces of a ten-dimensional space. 
List all values possible for dim(Wi n W2). 

4.36 Let V = W] © • • • © and for each index i suppose that St is a linearly 
independent subset of W^. Prove that the union of the Si's is linearly independent. 

4.37 A matrix is symmetric if for each pair of indices i and j, the I, j entry equals 
the j,i entry. A matrix is antisymmetric if each i, j entry is the negative of the j,i 
entry. 

(a) Give a symmetric 2x2 matrix and an antisymmetric 2x2 matrix. {Remark. 
For the second one, be careful about the entries on the diagonal.) 

(b) What is the relationship between a square symmetric matrix and its transpose? 
Between a square antisymmetric matrix and its transpose? 

(c) Show that M^xn is the direct sum of the space of symmetric matrices and the 
space of antisymmetric matrices. 

4.38 Let Wi , W2, W3 be subspaces of a vector space. Prove that (Wi n W2) + (W, n 
W3) C Wi n {W2 + W3). Does the inclusion reverse? 

4.39 The example of the x-axis and the y-axis in shows that Wi © W2 — V does 
not imply that W, U W2 = V. Can Wi © W2 = V and W, U W2 = V happen? 

/ 4.40 Consider Corollary 4.13. Does it work both ways — that is, supposing that V — 
W, +• • -©Wk, is V = W, ©• • -©Wk if and only if dim(V) = dim(W, )+• ■ •+dim(Wk)? 
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4.41 We know that if V = W, ® W2 then there is a basis for V that splits into a 
basis for W] and a basis for W2. Can we make the stronger statement that every 
basis for V splits into a basis for W] and a basis for W2? 

4.42 We can ask about the algebra of the '+' operation. 

(a) Is it commutative; is Wi + W2 = W2 + W, ? 

(b) Is it associative; is (W^ + W2) + W3 = Wi + (W2 + W3)? 

(c) Let W be a subspace of some vector space. Show that W + W = W. 

(d) Must there be an identity element, a subspace I such that 1 + W — W+1 = W 
for all subspaces W? 

(e) Does left-cancellation hold: if W, + W2 = Wi + W3 then W2 = W3? Right 
cancellation? 

4.43 Consider the algebraic properties of the direct sum operation. 

(a) Does direct sum commute: does V = W, ffi W2 imply that V = W2 ffi Wi ? 

(b) Prove that direct sum is associative: (Wi W2) W3 = W, (W2 W3). 

(c) Show that is the direct sum of the three axes (the relevance here is that by 
the previous item, we needn't specify which two of the three axes are combined 
first). 

(d) Does the direct sum operation left-cancel: does Wi W2 = W] W3 imply 
W2 = W3? Does it right-cancel? 

(e) There is an identity element with respect to this operation. Find it. 

(f) Do some, or all, subspaces have inverses with respect to this operation: is 
there a subspace W of some vector space such that there is a subspace li with 
the property that U W equals the identity element from the prior item? 



Fields 



Computations involving only integers or only rational numbers are much easier 
than those with real numbers. Could other algebraic structures, such as the 
integers or the rationals, work in the place of M in the definition of a vector 
space? 

Yes and no. If we take "work" to mean that the results of this chapter remain 
true then an analysis of the properties of the reals that we have used in this 
chapter gives a list of conditions that a structure needs in order to "work" in the 
place of R. 

0.1 Definition A field is a set J with two operations '+' and '•' such that 

(1) for any a, b G 5" the result of a + b is in 5" and 

• Q+b=b+a 

• if c G then a + (b + c) = (a + b) + c 

(2) for any a, b G 3^ the result of a • b is in 3^ and 

• a • b = b • a 

• if c G then a • (b • c) = (a • b) • c 

(3) if a, b, c G 9^ then a-(b + c) = a- b + Q- c 

(4) there is an element G 5" such that 

• if a G If then a + = a 

• for each a G 9^ there is an element — a G 9^ such that (— a) + a = 

(5) there is an element 1 G "J such that 

• if a G If then a • 1 = a 

• for each element a ^ of there is an element a^' G ? such that 

• Q = 1. 

The algebraic structure consisting of the set of real numbers along with its 
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usual addition and multiplication operation is a field, naturally. Another field is 
the set of rational numbers with its usual addition and multiplication operations. 
An example of an algebraic structure that is not a field is the integers, because 
it fails the final condition. 

Some examples are surprising. The set {0, 1 } under these operations: 



+ 





1 







1 








1 











1 


1 





1 





1 



is a field (see Exercise 5). 

We could in this book develop Linear Algebra as the theory of vector spaces 
with scalars from an arbitrary field. In that case, almost all of the statements here 
would carry over by replacing 'M' with 'y, that is, by taking coefficients, vector 
entries, and matrix entries to be elements of 3^ (the exceptions are statements 
involving distances or angles). Here are some examples; each applies to a vector 
space V over a field J'. 

* For any v e V and a G J, (i) • v = 0, (ii) —1 • v + v = 0, and (iii) a • = 0. 

* The span, the set of linear combinations, of a subset of V is a subspace of 
V. 

* Any subset of a linearly independent set is also linearly independent. 

* In a finite-dimensional vector space, any two bases have the same number 
of elements. 

(Even statements that don't explicitly mention J use field properties in their 
proof.) 

We will not develop vector spaces in this more general setting because the 
additional abstraction can be a distraction. The ideas we want to bring out 
already appear when we stick to the reals. 

The only exception is Chapter Five. There we must factor polynomials, so 
we will switch to considering vector spaces over the field of complex numbers. 

Exercises 

2 Show that the real numbers form a field. 

3 Prove that these are fields. 

(a) The rational numbers Q (b) The complex numbers C 

4 Give an example that shows that the integer number system is not a field. 

5 Consider the set S = {0, 1 } subject to the operations given above. Show that it is 
a field. 

6 Give suitable operations to make the set {0, 1,2} a field. 



Crystals 



Everyone has noticed that table salt comes in little cubes. 




The explanation for the cubical external shape is the simplest one that we could 
imagine: the internal shape, the way the atoms lie, is also cubical. The internal 
structure is pictured below. Salt is sodium chloride, and the small spheres shown 
are sodium while the big ones are chloride. To simplify the view, it only shows 
the sodiums and chlorides on the front, top, and right. 



The specks of salt that we see have many repetitions of this fundamental unit. 
A solid, such as table salt, with a regular internal structure is a crystal. 

We can restrict our attention to the front face. There we have a square 
repeated many times. 





The distance between the corners of the square cell is about 3.34 Angstroms (an 
Angstrom is 10^^° meters). Obviously that unit is unwieldy. Instead we can 
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take as a unit the length of each square's side. That is, we naturally adopt this 
basis. 

Then we can describe, say, the corner in the upper right of the picture above as 

3pi +2|32. 

Another crystal from everyday experience is pencil lead. It is graphite, 
formed from carbon atoms arranged in this shape. 





This is a single plane of graphite, called graphene. A piece of graphite consists 
of millions of these planes layered in a stack. The chemical bonds between the 
plcines are much weaker than the bonds inside the planes, which explains why 
pencils write — the graphite can be sheared so that the planes slide off and are 
left on the paper. 

We can get a convenient unit of length by decomposing the hexagonal ring 
into three regions that are rotations of this unit cell. 




The vectors that form the sides of that unit cell make a convenient basis. The 
distance along the bottom and slant is 1 .42 Angstroms, so this 




is a good basis. 

Another familiair crystad formed from carbon is diaimond. Like table sadt it 
is built from cubes but the structure inside each cube is more complicated. In 
addition to carbons at each corner. 




there are carbons in the middle of each face. 
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(To show the new face carbons clearly, the corner carbons are reduced to dots.) 
There are also four more carbons inside the cube, two that are a quarter of the 
way up from the bottom and two that are a quarter of the way down from the 
top. 




(As before, carbons shown earlier have are reduced here to dots.) The distance 
along any edge of the cube is 2.1 8 Angstroms. Thus, a natural basis for describing 
the locations of the carbons and the bonds between them, is this. 

/2.18\ / \ / \ 
( , 2.18 , ) 
\0 ) \ ) \2A8) 

The examples here show that the structures of crystals is complicated enough 
to need some organized system to give the locations of the atoms and how they 
are chemically bound. One tool for that organization is a convenient basis. This 
application of bases is simple but it shows a natural science context where the 
idea arises naturally. 

Exercises 

1 How many fundamental regions are there in one face of a speck of salt? (With a 
ruler, we can estimate that face is a square that is 0.1 cm on a side.) 

2 In the graphite picture, imagine that we are interested in a point 5.67 Angstroms 
over and 3.14 Angstroms up from the origin. 

(a) Express that point in terms of the basis given for graphite. 

(b) How many hexagonal shapes away is this point from the origin? 

(c) Express that point in terms of a second basis, where the first basis vector is 
the same, but the second is perpendicular to the first (going up the plane) and 
of the same length. 

3 Give the locations of the atoms in the diamond cube both in terms of the basis, 
and in Angstroms. 

4 This illustrates how we could compute the dimensions of a unit cell from the 
shape in which a substance crystallizes ([Ebbing], p. 462). 

(a) Recall that there are 6.022 x 10^^ atoms in a mole (this is Avogadro's number). 
Prom that, and the fact that platinum has a mass of 195.08 grams per mole, 
calculate the mass of each atom. 

(b) Platinum crystallizes in a face-centered cubic lattice with atoms at each lattice 
point, that is, it looks like the middle picture given above for the diamond crystal. 
Find the number of platinum's per unit cell (hint: sum the fractions of platinum's 
that are inside of a single cell). 
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(c) Prom that, find the mass of a unit cell. 

(d) Platinum crystal has a density of 21 .45 grams per cubic centimeter. Prom 
this, and the mass of a unit cell, calculate the volume of a unit cell. 

(e) Pind the length of each edge. 

(f) Describe a natural three-dimensioned basis. 



Voting Paradoxes 



Imagine that a Political Science class studying the American presidential process 
holds a mock election. The 29 class members rank the Democratic Party, 
Republican Party, and Third Party nominees, from most preferred to least 
preferred (> means 'is preferred to'). 

number with 



preference order that preference 

Democrat > Republican > Third 5 

Democrat > Third > Republican 4 

Republican > Democrat > Third 2 

Republican > Third > Democrat 8 

Third > Democrat > Republican 8 

Third > Republican > Democrat 2 



What is the preference of the group as a whole? 

Overall, the group prefers the Democrat to the Republican by five votes; 
seventeen voters ranked the Democrat above the Republican versus twelve the 
other way. And the group prefers the Republican to the Third's nominee, fifteen 
to fourteen. But, strangely enough, the group also prefers the Third to the 
Democrat, eighteen to eleven. 

Democrat 




Third Republican 



1 voter 

This is a voting paradox, specifically, a majority cycle. 

Mathematicians study voting paradoxes in part because of their implications 
for practical politics. For instance, the instructor of this class can manipulate 
them into choosing the Democrat as the overall winner by first asking for a vote 
to choose between the Republican and the Third, and then asking for a vote to 
choose between the winner of that contest, the Republican, and the Democrat. 
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The instructor can make any of the other two candidates come out as the winner 
by similar manipulations. (Here we will stick to three-candidate elections but 
the same thing happens in larger elections.) 

Mathematicians also study voting paradoxes simply because they are inter- 
esting. One interesting aspect is that the group's overall majority cycle occurs 
despite that each single voter's preference list is rational, in a straight-line order. 
That is, the majority cycle seems to arise in the aggregate without being present 
in the components of that aggregate, the preference lists. However we can use 
linear algebra to argue that a tendency toward cyclic preference is actually 
present in each voter's list and that it surfaces when there is more adding of the 
tendency than canceling. 

For this, abbreviating the choices as D, R, and T, we can describe how a 
voter with preference order D > R > T contributes to the above cycle. 




1 voter 



1 voter 



(The negative sign is here because the arrow describes T as preferred to D, but 
this voter likes them the other way.) The descriptions for the other preference 
lists are in the table on page 146. 

Now, to conduct the election we linearly combine these descriptions; for 
instance, the Political Science mock election 



5 • 



+ 2 



yields the circular group preference shown earlier. 

Of course, taking linear combinations is linear algebra. The graphical cycle 
notation is suggestive but inconvenient so we use column vectors by starting at 
the D and taking the numbers from the cycle in counterclockwise order. Thus, 
we represent the mock election and a single D > R > T vote in this way. 



(7\ 

^ 



-1 



and 



1 



We will decompose vote vectors into two parts, one cyclic and the other 
acyclic. For the first part, we say that a vector is purely cyclic if it is in this 
subspace of . 



C 




ke M} = {k. 



k e M} 



Topic: Voting Paradoxes 



145 



For the second part, consider the set of vectors that are perpendicular to all of 
the vectors in C. Exercise 6 shows that this is a subspace. 



Cl 



C2 I • I k 1 = for all k e M} 



1 



{ C2 I C, + C2 + C3 = 0} = {C2 1 + C3 I C2, C3 € M} 







(Read the name as "C perp.") So we are led to this basis for 




We can represent votes with respect to this basis, and thereby decompose them 
into a cyclic part and an acyclic part. (Note for readers who have covered the 
optional section in this chapter: that is, the space is the direct sum of C 
and C^J 

For example, consider the D > R > T voter discussed above. We represent it 
with respect to the basis 



Cl -C2 -C3 =-1 

Cl + C2 =1 
Cl +C3= 1 



- Pl +P 2 (-1/ 2)P2 + P3 
-Pl +P3 



Cl - C2 - C3 =-1 

2c2 + C3 = 2 
(3/2)c3= 1 



using the coordinates ci = 1/3, C2 = 2/3, and C3 —2/3. Then 



gives the desired decomposition into a cyclic part and an acyclic part. 






^^^^ "^1/3-4/3^ "^^^^ 

T R ~ T R T 3 

1 /3 



T R 

2/i 



Thus we can see that this D > R > T voter's rational preference list does have a 
cyclic part. 

The T > R > D voter is opposite to the one just considered in that the '>' 
symbols are reversed. This voter's decomposition 



T _R 

-1 /3 



T R 

-2/i 



shows that these opposite preferences have decompositions that are opposite. 
We say that the first voter has positive spin since the cycle part is with the 
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direction that we have chosen for the arrows, while the second voter's spin is 
negative. 

The fact that these opposite voters cancel each other is reflected in the fact 
that their vote vectors add to zero. This suggests an alternate way to tally an 
election. We could first cancel as many opposite preference lists as possible, and 
then determine the outcome by adding the remaining lists. 

The rows of the table below contain the three pairs of opposite preference 
lists. The columns group those pairs by spin. For instance, the first row contains 
the two voters just considered. 



positive spin 



negative spin 



Democrat > Republican > Third 



Third > Republican > Democrat 

/D, .^D-»k ^D'*, 



T R 

1 /3 



T R 

2/3 



2/3 1^ -1/3 

T R 



-2/3 



— 1 



T R 

-1 /3 



T R 

-2/3 



Republican > Third > Democrat 



Democrat > Third > Republican 



1^ '^-1 '^^^ ^'^■f" "^-"/^ -1^ "^1 "^-1/^2/3^ "^4/3 



T R 

1 /3 



T R 

2/3 



T R 

-1 /3 



T R 

-2/3 



Republican > Democrat > Third 



Third > Democrat > Republican 

TR TR'TR ^ - 



1 /3 



-4/3 



T R 

-1 /3 



T R 

4/3 



If we conduct the election as just described then after the cancellation of as 
many opposite pairs of voters as possible then there will be left three sets of 
preference lists: one set from the first row, one from the second row, and one 
from the third row. We will finish by proving that a voting paradox can happen 
only if the spins of these three sets are in the same direction. That is, for a 
voting paradox to occur, the three remaining sets must all come from the left of 
the table or all come from the right (see Exercise 3). This shows that there is 
some connection between the majority cycle and the decomposition that we are 
using — a voting paradox can happen only when the tendencies toward cyclic 
preference reinforce each other. 

For the proof, assume that we have cancelled opposite preference orders and 
we are left with one set of preference lists from each of the three rows. Consider 
the sum of these three (here, the numbers a, b, and c could be positive, negative, 
or zero). 



\ a — b - 



A voting paradox occurs when the three numbers on the right, a — b -|- c and 
a -|- b — c and —a + b -|- c, are all nonnegative or all nonpositive. On the left. 



Topic: Voting Paradoxes 



147 



at least two of the three numbers a and b and c are both nonnegative or both 
nonpositive. We can assume that they are a and b. That makes four cases: the 
cycle is nonnegative and a and b are nonnegative, the cycle is nonpositive and 
Q and b are nonpositive, etc. We will do only the first case, since the second is 
similar and the other two are also easy. 

So assume that the cycle is nonnegative and that a and b are nonnegative. 
The conditions ^ a — b + c and ^ —a + b + c add to give that ^ 2c, which 
implies that c is also nonnegative, as desired. That ends the proof. 

This result says only that having all three spin in the same direction is a 
necessary condition for a majority cycle. It is not sufficient; see Exercise 4. 

Voting theory and associated topics are the subject of current research. There 
are many intriguing results, most notably the one produced by K Arrow [Arrow], 
who won the Nobel Prize in part for this work, showing that no voting system 
is entirely fair (for a reasonable definition of "fair"). For more information, some 
good introductory articles are [Gardner, 1970], [Gardner, 1974], [Gardner, 1980], 
and [Neimi & Riker] . [Taylor] is a readable recent book. The long list of cases 
from recent American political history in [Poundstone] shows these paradoxes 
are routinely manipulated in practice. 

This Topic is largely drawn from [Zwicker]. (Author's Note: I would like 
to thank Professor Zwicker for his kind and illuminating discussions.) 

Exercises 

1 Here is a reasonable way in which a voter could have a cyclic preference. Suppose 
that this voter ranks each candidate on each of three criteria. 

(a) Draw up a table with the rows labeled 'Democrat', 'Republican', and 'Third', 
and the columns labeled 'character', 'experience', and 'policies'. Inside each 
column, rank some candidate as most preferred, rank another as in the middle, 
and rank the remaining one as least preferred. 

(b) In this ranking, is the Democrat preferred to the Republican in (at least) two 
out of three criteria, or vice versa? Is the Republican preferred to the Third? 

(c) Does the table that was just constructed have a cyclic preference order? If 
not, make one that does. 

So it is possible for a voter to have a cyclic preference among candidates. The 
paradox described above, however, is that even if each voter has a straight-line 
preference list, a cyclic preference can still arise for the entire group. 

2 Compute the values in the table of decompositions. 

3 Do the cancellations of opposite preference orders for the Political Science class's 
mock election. Are all the remaining preferences from the left three rows of the 
table or from the right? 

4 The necessary condition that is proved above — a voting paradox can happen only 
if all three preference lists remaining after cancellation have the same spin — is not 
also sufficient. 

(a) Continuing the positive cycle case considered in the proof, use the two in- 
equalities ^ Q — b + c and ^ —a + b + c to show that |a — b| ^ c. 

(b) Also show that c ^ a + b, and hence that |a — b| a + b. 

(c) Give an example of a vote where there is a majority cycle, and addition of 
one more voter with the same spin causes the cycle to go away. 
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(d) Can the opposite happen; can addition of one voter with a "wrong" spin cause 
a cycle to appear? 

(e) Give a condition that is both necessary and sufficient to get a majority cycle. 

5 A one-voter election cannot have a majority cycle because of the requirement that 
we've imposed that the voter's list must be rational. 

(a) Show that a two-voter election may have a majority cycle. (We consider the 
group preference a majority cycle if all three group totals are nonnegative or if 
all three are nonpositive — that is, we allow some zero's in the group preference.) 

(b) Show that for any number of voters greater than one, there is an election 
involving that many voters that results in a majority cycle. 

6 Let U be a subspace of . Prove that the set = {v | v • u = for all u e U} 
of vectors that are perpendicular to each vector in U is also subspace of . Does 
this hold if li is not a subspace? 



Dimensional Analysis 

"You can't add apples and oranges," the old saying goes. It reflects our experience 
that in applications the quantities have units and keeping track of those units 
can help with problems. Everyone has done calculations such as this one that 
use the units as a check. 

sec min ^, hr day sec 

60 — • 60 — • 24 — • 365 — = 31 536000 

mm hr day year year 

However, we can take the idea of including the units beyond bookkeeping. We 
can use units to draw conclusions about what relationships are possible among 
the physical quantities. 

To start, consider the falling body equation distance = 16 • (time)^. If the 
distance is in feet and the time is in seconds then this is a true statement. 
However it is not correct in other unit systems, because 16 isn't the right 
constant in those systems. We can fix that by attaching units to the 1 6, making 
it a dimensional constant. 

ft , 
dist — 16 — y • (time) 
sec^ 

Now the equation holds also in the meter-second system because when we align 
the units (a foot is approximately 0.30 meters), 

1^ 0.30m . ,2 ^ o ™ , • • ^2 

distance m meters — 16 ^ • [time m sec) — 4.8 y • (time m sec) 

sec^ sec^ 

the constant gets adjusted. So, in order to look at equations that are correct 
across unit systems, we restrict our attention to those that use dimensional 
constants; such an equation is said to be complete. 

Moving away from a specific unit system allows us to just say that we measure 
all quantities here in combinations of some units of length L, mass M, and time T. 
These three are our dimensions. For instance, we could measure velocity in 
feet /second or fathoms/hour but at all events it involves a unit of length divided 
by a unit of time so the dimensional formula of velocity is L/T. Similarly, we 
could state density's dimensional formula as M/L^. 

To write the dimensional formula we shall use negative exponents instead of 
fractions and we shall include the dimensions with a zero exponent. Thus we 
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will write the dimensional formula of velocity as L' M°T ^ and that of density 
as L-^M^T°. 

Thus, "you can't add apples to oranges" becomes the advice to check that 
all of an equation's terms have the same dimensional formula. An example is 
this version of the falling body equation d — gt^ — 0. The dimensional formula 
of the d term is L^M^T*^. For the other term, the dimensional formula of g 
is M°T^^ (g is given above as 16 ft/sec^) and the dimensional formula of t 
is L°M°V so that of the entire gt^ term is M°T-2(L°M°T' )^ = VM°T°. 
Thus the two terms have the same dimensional formula. An equation with this 
property is dimensionally homogeneous. 

Quantities with dimensional formula L°M°T° are dimensionless. For ex- 
ample, we measure an angle by taking the ratio of the subtended arc to the 
radius 

r 

which is the ratio of a length to a length (L^ M°T°)(L^ M°T°)^^ and thus angles 
have the dimensional formula L°M°T°. 

The classic example of using the units for more than bookkeeping, using 
them to draw conclusions, considers the formula for the period of a pendulum. 

p — -some expression involving the length of the string, etc- 

The period is in units of time L'^M^T' . So the quantities on the other side of 
the equation must have dimensional formulas that combine in such a way that 
their L's and M's cancel and only a single T remains. The table on page 151 has 
the quantities that an experienced investigator would consider possibly relevant 
to the period of a pendulum. The only dimensional formulas involving L are for 
the length of the string and the acceleration due to gravity. For the L's of these 
two to cancel, when they appear in the equation they must be in ratio, e.g., as 
(£/g)^, or as cos(£/g), or as [l/g]^^ . Therefore the period is a function of £/g. 

This is a remarkable result: with a pencil and paper analysis, before we ever 
took out the pendulum and made measurements, we have determined something 
about what makes up its period. 

To do dimensional analysis systematically, we need to know two things 
(arguments for these are in [Bridgman], Chapter II and IV). The first is that 
each equation relating physical quantities that we shall see involves a sum of 
terms, where each term has the form 

m^' • • • mj^" 

for numbers mi , . . . , mk that measure the quantities. 

For the second, observe that an easy way to construct a dimensionally 
homogeneous expression is by taking a product of dimensionless quantities 
or by adding such dimensionless terms. Buckingham's Theorem states that 
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amy complete relationship among quamtities with dimensioned formulas cam be 
algebraically manipulated into a form where there is some function f such that 



f(ni,...,n. 







for a complete set {Hi , . . . , TTn) of dimensionless products. (The first example 
below describes what makes a set of dimensionless products 'complete'.) We 
usually want to express one of the quantities mi for instance, in terms of the 
others, and for that we will assume that the above equality can be rewritten 



mi 



m2^^---mr^''-f(n2,...,n^ 



where TTi = mi my ■ ■ ■ m^'' is dimensionless and the products TT2, . . . , TTn don't 
involve mi (as with f, here f is just some function, this time of n — 1 arguments). 
Thus, to do dimensional analysis we should find which dimensionless products 
are possible. 

For example, consider again the formula for a pendulum's period. 



T 





dimensioncLl 


quantity 


formula 


period p 


L°M°V 


length of string £ 


V M°T0 


mass of bob m 




acceleration due to gravity g 


LiM°T-2 


arc of swing 9 





By the first fact cited above, we expect the formula to have (possibly sums of 
terms of) the form pT" P^m^^ g^^B^^ . To use the second fact, to find which 
combinations of the powers pi , . . . , ps yield dimensionless products, consider 
this equation. 

It gives three conditions on the powers. 

P2 + P4 =0 
P3 =0 
Pi -2p4 =0 

Note that p3 = so the mass of the bob does not affect the period. Gaussian 
reduction and parametrization of that system gives this 



P5 pi,P5 e 



(we've taken pi as one of the parameters in order to express the period in terms 
of the other quantities). 



/Pl\ 


/ A 




/o\ 


P2 


-1/2 







P3 -- 





pi + 





P4 


1/2 







VPs/ 


V V 
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The set of dimensionless products contains all terms V^mP^ ^ 
subject to the conditions above. This set forms a vector space under the '+' 
operation of multiplying two such products and the '•' operation of raising such 
a product to the power of the scalar (see Exercise 5). The term 'complete set of 
dimensionless products' in Buckingham's Theorem means a basis for this vector 
space. 

We can get a basis by first taking pi = 1 , ps = 0, and then taking pi — 0, 
P5 = 1. The associated dimensionless products are Hi — p£^^/^g^/^ and Fla — 9. 
Because the set {rTijTTa} is complete, Buckingham's Theorem says that 

where f is a function that we cannot determine from this analysis (a first year 
physics text will show by other means that for small angles it is approximately 
the constant function f(9) = 2n). 

Thus, analysis of the relationships that are possible between the quantities 
with the given dimensional formulas has given us a fair amount of information: a 
pendulum's period does not depend on the mass of the bob, and it rises with 
the square root of the length of the string. 

For the next example we try to determine the period of revolution of two 
bodies in space orbiting each other under mutual gravitational attraction. An 
experienced investigator could expect that these are the relevant quantities. 













dimensional 


quantity 


formula 


period p 


L°M°V 


mean separation r 




first mass mi 


L°MiT° 


second mass mz 


L°MiT° 


gravitational constant G 





To get the complete set of dimensionless products we consider the equation 
which results in a system 

P2 + 3p5 = 

P3 +P4 - P5 =0 
Pi - 2p5 = 



with this solution. 



{ 



/ 1\ 

-3/2 

1/2 


V 1/2/ 



Pi 





-1 
1 

V 0/ 



P4 I pi,P4 e K} 
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As earlier, the set of dimensionless products of these quantities forms a 
vector space and we want to produce a basis for that space, a 'complete' set of 
dimensionless products. One such set, gotten from setting pi = 1 and p4 = 
and also setting pi =0 and p4 — 1 is {0] = pr^^^^my^G^''^, TT2 —mj^mz}- 
With that, Buckingham's Theorem says that any complete relationship among 
these quantities is stateable this form. 

_3/2 

p = r'/^m-'/^G-'/^ ■ iimi'mz] = ^j== ■ f(m2/mi] 

Remark. An important application of the prior formula is when rai is the 
mass of the sun and m.2 is the mass of a planet. Because rai is very much greater 
than mz, the argument to f is approximately 0, and we can wonder whether 
this part of the formula remains approximately constant as mi varies. One way 
to see that it does is this. The sun is so much larger than the planet that the 
mutual rotation is approximately about the sun's center. If we vary the planet's 
mass TTVi by a factor of x (e.g., Venus's mass is x = 0.815 times Earth's mass), 
then the force of attraction is multiplied by x, and x times the force acting on 
X times the mass gives, since F = ma, the same acceleration, about the same 
center (approximately). Hence, the orbit will be the same and so its period 
will be the same, and thus the right side of the above equation also remains 
unchanged (approximately). Therefore, f (mi/mi ) is approximately constant as 
mi varies. This is Kepler's Third Law: the square of the period of a planet is 
proportional to the cube of the mean radius of its orbit about the sun. 

The final example was one of the first explicit applications of dimensional 
analysis. Lord Raleigh considered the speed of a wave in deep water and 
suggested these as the relevant quantities. 

dimensional 
quantity formula 



velocity of the wave v 




density of the water d 


L-3m1T° 


acceleration due to gravity g 




wavelength A 





The equation 



gives this system 



Pl -3p2 + P3 +P4 =0 

Pi =0 
-Pi -2p3 =0 



with this solution space. 

{ ' 
^ -1/2 



Pl Pi e 
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There is one dimensionless product, Hi — vg^^ ''^ , and so v is \/Ag times 
a constant; f is constant since it is a function of no arguments. The quantity d 
is not involved in the relationship. 

The three examples above show that dimensional analysis can bring us far 
toward expressing the relationship among the quantities. For further reading, 
the classic reference is [Bridgman] — this brief book is delightful. Another source 
is [Giordano, Wells, Wilde]. A description of dimensional analysis's place in 
modeling is in [Giordano, Jaye, Weir]. 

Exercises 

1 [de Mestre] Consider a projectile, launched with initial velocity Vq, at an angle 9. 

To study its motion we may guess that these are the relevant quantities. 

dimensional 



quantity 


formula 


horizontal position x 


VM°J° 


vertical position y 


VM°J° 


initial speed Vq 


LiM°T-' 


angle of launch 6 


L°M°J° 


acceleration due to gravity g 


VM°J-^ 


time t 


L°M°V 



(a) Show that {gt/vo, gx/vg, gy/vg, 6} is a complete set of dimensionless products. 
{Hint. One way to go is to find the appropriate free variables in the linear system 
that arises but there is a shortcut that uses the properties of a basis.) 

(b) These two equations of motion for projectiles are familiar: x = vq cos(9)t and 
y = vo sin(9)t — (g/2)t^. Manipulate each to rewrite it as a relationship among 
the dimensionless products of the prior item. 

2 [Einstein] conjectured that the infrared characteristic frequencies of a solid maight 
be determined by the same forces between atoms as determine the solid's ordinary 
elastic behavior. The relevant quantities are these. 

dimensional 
quantity formula 



characteristic frequency v 
compressibility k 
number of atoms per cubic cm N 
mass of an atom ra 

Show that there is one dimensionless product. Conclude that, in any complete 
relationship among quantities with these dimensional formulas, k is a constant 
times v^^N^'/^m^'. This conclusion played an important role in the early study 
of quantum phenomena. 

3 [Giordano, Wells, Wilde] The torque produced by an engine has dimensional 
formula L^M^T^^. We may first guess that it depends on the engine's rotation 
rate (with dimensional formula L''M''T^'), and the volume of air displaced (with 
dimensional formula L^M"!"). 

(a) Try to find a complete set of dimensionless products. What goes wrong? 

(b) Adjust the guess by adding the density of the air (with dimensional formula 
L^-'M^T"). Now find a complete set of dimensionless products. 

4 [Tilley] Dominoes falling make a wave. We may conjecture that the wave speed v 
depends on the spacing d between the dominoes, the height h of each domino, and 
the acceleration due to gravity g. 

(a) Find the dimensional formula for each of the four quantities. 
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(b) Show that {FIi =h/d, 112 = dg/v^} is a complete set of dimensionless products. 

(c) Show that if h/d is fixed then the propagation speed is proportional to the 
square root of d. 

5 Prove that the dimensionless products form a vector space under the + operation 
of multiplying two such products and the ^ operation of raising such the product 
to the power of the scalar. (The vector arrows are a precaution against confusion.) 
That is, prove that, for any particular homogeneous system, this set of products of 
powers of mi , . . . , m^ 

{m^"' . . . mj^'' I pi , . . . , Pk satisfy the system} 

is a vector space under: 

m^' ...m^^+m^' . . . m^" = m^ ' ...mP^^"^ 

and 

ri:m^i ...mP") = m7' ...m^^'^ 
(assume that all variables represent real numbers). 

6 The advice about apples and oranges is not right. Consider the familiar equations 
for a circle C — 2nr and A — nr^ . 

(a) Check that C and A have different dimensional formulas. 

(b) Produce an equation that is not dimensionally homogeneous (i.e., it adds 
apples and oranges) but is nonetheless true of any circle. 

(c) The prior item asks for an equation that is complete but not dimensionally 
homogeneous. Produce an equation that is dimensionally homogeneous but not 
complete. 

(Just because the old saying isn't strictly right, doesn't keep it from being a 
useful strategy. Dimensional homogeneity is often used to check the plausibility 
of equations used in models. For an argument that any complete equation can 
easily be made dimensionally homogeneous, see [Bridgman], Chapter I, especially 
page 15.) 
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Maps Between Spaces 



I Isomorphisms 



In the examples following the definition of a vector space we expressed the idea 
that some spaces are "the same" as others. For instance, the space of two-tall 
column vectors and the space of two-wide row vectors are not equal because 
their elements — column vectors and row vectors — are not equal, but we have 
the idea that these spaces differ only in how their elements appear. We will now 
make this intuition precise. 

This section illustrates a common aspect of a mathematical investigation. 
With the help of some examples, we've gotten an idea. We will next give a formal 
definition and then we will produce some results backing our contention that 
the definition captures the idea. We've seen this happen already, for instance in 
the first section of the Vector Space chapter. There, the study of linear systems 
led us to consider collections closed under linear combinations. We defined such 
a collection as a vector space and we followed it with some supporting results. 

That definition wasn't an end point, instead it led to new insights such as the 
idea of a basis. Here too, after producing a definition and supporting it, we will 
get two surprises (pleasant ones). First, we will find that the definition applies 
to some unforeseen, and interesting, cases. Second, the study of the definition 
will lead to new ideas. In this way, our investigation will build momentum. 



1.1 Definition and Examples 

We start with two examples that suggest the right definition. 

1.1 Example The space of two-wide row vectors and the space of two-tall column 
vectors are "the same" in that if we associate the vectors that have the same 
components, e.g.. 
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then this correspondence preserves the operations, for instance this addition 

(1 2, + (3 4) = (4 6) ^ O + Pj^t) 
and this scalar multiphcation. 

5.(1 2)^(5 10) ^ ^•©^(lo) 
More generally stated, under the correspondence 

(qo ai) < — > 
both operations are preserved: 

(ao a,] + (bo b,) = (ao+bo + ) ^ (^J;) + (b^) = (a^ t bf 
and 

/ ao \ / rao \ 



do 



T- (ao a, ) = (rao ra-i ] 



ai / \rai 



(all of the variables are real numbers). 

1.2 Example Another two spaces we can think of as "the same" are Vz, the space 
of quadratic polynomials, and M^. A natural correspondence is this. 

ao + aix + a2X^ < — > \ ci^ \ (e.g., 1 + 2x + 3x^ < — > \ ^ \) 

This preserves structure: corresponding elements add in a corresponding way 

ao + aix+aax^ f clo\ fho\ /ao+bo' 

+ bo + bix + bax^ i — > ai + bi = ai + oi 




(ao + bo) + (ai + bi )x + (a2 + bajx^ \a2 J J \a2 + b2, 

and scalar multiplication also corresponds. 

r • (ao + aix + a2X^) = (rao) + {Ta^ )x + (ra2)x^ 



ao^ 




( Tao 






rai 






lra2 



1.3 Definition An isomorphism between two vector spaces V and W is a map 
f : V ^ W that 

(1) is a correspondence: f is one-to-one and onto;* 
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(2) preserves structure: if vi , V2 G V then 
and if V e V and r e M then 

f (tv) = rf (v) 

(we write V = W, read "V is isomorphic to W", when such a map exists). 

("Morphism" means map, so "isomorphism" means a map expressing sameness.) 

1.4 Example The vector space G = {ci cos 9 + Ci sin 9 | Ci , Ci G K} of functions 
of 9 is isomorphic to the vector space under this map. 



Ci cos 9 + C2 sin 9 i— ( ''^ 

\C2, 



We will check this by going through the conditions in the definition. 

We will first verify condition (1), that the map is a correspondence between 
the sets underlying the spaces. 

To establish that f is one-to-one, we must prove that f (a) = f(b) only when 
a = b. If 

f (ai cos Q + az sin 9) = f (bi cos 9 + b2 sin 9] 
then, by the definition of f , 

(:)=(::) 

from which we can conclude that ai = bi and az = bi because column vectors 
are equal only when they have equal components. 

To check that f is onto we must prove that any member of the codomain 
is the image of some member of the domain G. But that's clear since 



'X' 



is the image under f of x cos 9+1) sin 9. 

Next we will verify condition (2), that f preserves structure. 



*More information on one-to-one and onto maps is in the appendix. 
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This computation shows that f preserves addition. 

f ( (qi cos Q + az sin 0) + (bi cos 9 + ba sin 0) ) 

= f( (qi + bi ) cos9 + (ai +b2)sine) 

^ /a, +bi 
yai + b2 

— f (ai cos + a2 sin 9) + f (bi cos 9 + b2 sin 9) 
A similar computation shows that f preserves scalar multiplication. 

f ( r • (ai cos 9 + a2 sin 9) ) = f ( rai cos 9 + raz sin 9 ) 

- (rQ2) 

-(:) 

— r- f (ai cos 9 + a2 sin 9) 

With that, conditions (1) and (2) are verified, so we know that f is an 
isomorphism and we can say that the spaces are isomorphic G = . 

1.5 Example Let V be the space {cix + Czy + C3Z | Ci , C2, C3 e M} of linear com- 
binations of three variables x, y, and z, under the natural addition and scalar 
multiplication operations. Then V is isomorphic to Tz, the space of quadratic 
polynomials. 

To show this we will produce an isomorphism map. There is more than one 
possibility; for instance, here are four. 

Ci + C2X + CsX^ 

C2 + C3X + c^x^ 

— Cl — C2X — CsX^ 
Ci + (Ci + C2)X+ (Ci + Csjx^ 

The first map is the more natural correspondence in that it just carries the 
coefficients over. However, below we shall verify that the second one is an 
isomorphism, to underline that there are isomorphisms other than just the 
obvious one (showing that f] is an isomorphism is Exercise 13). 

To show that f2 is one-to-one, we will prove that if f2(cix + Czy + C3Z) = 
f2(dix + d2'y + d3z) then cix + czy + C3Z = dix + d2'y + d3Z. The assumption 
that f2(cix + C2'y+C3z) = f2(dix+d2'y + d3z) gives, by the definition of f2, that 
Cz + C3X + Cl x^ = d2 + diX + d^x^. Equal polynomials have equal coefficients, 
so Cz — dz, C3 = d3, and Ci = di . Therefore f2 is one-to-one. 



Cix + Czy + C3Z 
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The map f 2 is onto because any member a + bx + cx- 



of the CO domain is 



the image of a member of the domain, namely cx + ay + bz. For instance, 
2 + 3x - 4x^ is f2(-4x + 2y+ 3z). 

The computations for structure preservation are hke those in the prior 
example. This map preserves addition 

f2((cix + Cay + C3Z) + (dix + d2ij + dsz)) 



Thus f 2 is an isomorphism and we write V = ^2 • 

Every space is isomorphic to itself under the identity map. 

1.6 Definition An automorphism is an isomorphism of a space with itself. 

1.7 Example A dilation map ds : ^ that multiplies all vectors by a nonzero 
scalar s is an automorphism of . 



A rotation or turning map te : — > that rotates all vectors through an 
cingle 9 is an automorphism. 



f2((Cl +di)x+(C2 + d2)y + (C3 + d3)z) 
[C2 + dz] + (C3 + d3)x + (Ci + di )x^ 
(C2 + C3X + ClX^) + (d2 + d3X+ diX^) 

filcix + czy + C3Z) + f2(dix + d2y + d3z) 



and scalar multiplication. 



f2 (r • (ci X + C2y + C3Z)) = f 2 (rci X + rc2y + rc3z) 



— rc2 + rc3X + rci x^ 

= r • (C2 +C3X + CiX^) 

= r- f2(cix + C2y + C3Z) 





A third type of automorphism of is a map f £ : — > that flips or reflects 
all vectors over a line £ through the origin. 
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Checking that these are automorphisms is Exercise 30. 

1.8 Example Consider the space CP5 of polynomials of degree 5 or less and the 
map f that sends a polynomial p(x) to p(x — 1 ). For instance, under this map 
(x-1)2 =x2-2x+1 and x^ +2x1-^ (x- 1 )3 +2(x- 1 ) = x^ -Sx^ +5x-3. 
This map is an automorphism of this space; the check is Exercise 22. 

This isomorphism of with itself does more than just tell us that the space 
is "the same" as itself. It gives us some insight into the space's structure. For 
instance, below is shown a family of parabolas, graphs of members of Ts. Each 
has a vertex at y = — 1, and the left-most one has zeroes at —2.25 and —1.75, 
the next one has zeroes at —1.25 and —0.75, etc. 




Substitution of x — 1 for x in any function's argument shifts its graph to the 
right by one. Thus, f(po) = Pi • Notice that the picture before f is applied is the 
same as the picture after f is applied because while each parabola moves to the 
right, another one comes in from the left to take its place. This also holds true 
for cubics, etc. So the automorphism f expresses the idea that P5 has a certain 
horizontal-homogeneity, that this space looks the same near x = 1 as near x = 0. 

As described in the opening to this section, having given the definition of 
isomorphism, we next support the contention that it captures our intuition of 
vector spaces being the same. Of course, the definition itself is persuasive: a 
vector space consists of a set and some structure and the definition simply 
requires that the sets correspond and that the structures correspond also. Also 
persuasive are the examples above, such as Example 1.1 giving the isomorphism 
between the space of two-wide row vectors and the space of two-tall column 
vectors, which dramatize that isomorphic spaces are the same in all relevant 
respects. Sometimes people say, where V = W, that "W is just V painted 
green" — differences are merely cosmetic. 

The results below further support our contention that under an isomorphism 
all the things of interest in the two vector spaces correspond. Because we 
introduced vector spaces to study linear combinations, "of interest" means 
"pertaining to linear combinations." Not of interest is the way that the vectors 
are presented typographically (or their color!). 

1.9 Lemma An isomorphism maps a zero vector to a zero vector. 



Proof Where f : V — > W is an isomorphism, fix any v e V. Then f (Oy] = 
f (0 • v) = • f (v) = Ow QED 
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1.10 Lemma For any map f : V ^ W between vector spaces these statements are 
equivalent. 

(1) f preserves structure 

f (vi + = f (vi ) + f (V2) and f (cv) = c f (v) 

(2) f preserves linear combinations of two vectors 

f(ClVi +C2V2) =Clf(vi) + C2f(v2) 

(3) f preserves linear combinations of any finite number of vectors 

f(CiVi H hCnVn) = Ci f (vi ) + hCnf(Vn) 

Proof Since the implications (3) (2) and (2) (1) are clear, we need 
only show that (1) (3). Assume statement (1). We will prove statement (3) 
by induction on the number of summands n. 

The one-summand base case, that f (cvi ) — c f (v] ), is covered by the assump- 
tion of statement (1). 

For the inductive step assume that statement (3) holds whenever there are k 
or fewer summands, that is, whenever n = 1 , or n = 2, . . . , or n = k. Consider 
the k+ 1-summand case. Use the first half of (1) to break the sum along the 
final '+'. 

f(civi H hCkVk + Ck+iVk+i) =f(civi H hCkVk) +f(Ck+iVk+i) 

Use the inductive hypothesis to break up the k-term sum on the left. 

= f (civi ) H h f (CkVk) + f(Ck+iVk+i ) 

Now the second half of (1) gives 

= Ci f (Vi ) H + Cic f [Vk] + Ck+l f (Vk+l ) 

when applied k+ 1 times. QED 

Using item (2) is a standard way to verify that a map preserves structure. 

We close with a summary. In the prior chapter, after giving the definition 
of a vector space, we looked at examples and some of them seemed to be 
essentially the same. Here we have defined the relation ' = ' and have argued 
that it is the right way to say precisely what we mean by 'the same' because it 
preserves the features of interest in a vector space — in particular, it preserves 
linear combinations. In the next section we will show that isomorphism is an 
equivalence relation and so partitions the collection of vector spaces into cases. 

Exercises 

/ 1.11 Verify, using Example 1.4 as a model, that the two correspondences given before 
the definition are isomorphisms. 
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(a) Example 1.1 (b) Example 1.2 
/ 1.12 For the map f : 7, given by 

'a-b^ 
b 

Find the image of each of these elements of the domain. 

(a) 3-2x (b) 2 + 2x (c) x 
Show that this map is an isomorphism. 
1.13 Show that the natural map f] from Example 1.5 is an isomorphism. 
/ 1.14 Decide whether each map is an isomorphism (if it is an isomorphism then prove 
it and if it isn't then state a condition that it fails to satisfy), 
(a) f : M2x2 K given by 

b^ 



(b) f : M2 



given by 



1-^ ad — be 



/a + b + c + d\ 
Q + b + c 
a + b 
a 



(c) f : M2 



(d) f : M2 



^3 given by 
b 
d 



?3 given by 
'a b^ 

pi 



1-^ c + (d + c)x + (b + a)x^ + ax^ 

> C + (d + C)X + (b + Q + 1 )X^ + QX^ 

given by f (x) = x'' is one-to-one and onto. Is it 



1.15 Show that the map f : 
an isomorphism? 

/ 1.16 Refer to Example 1.1. Produce two more isomorphisms (of course, you must 
also verify that they satisfy the conditions in the definition of isomorphism). 
1.17 Refer to Example 1.2. Produce two more isomorphisms (and verify that they 
satisfy the conditions). 
/ 1.18 Show that, although is not itself a subspace of M^, it is isomorphic to the 
xy-plane subspace of . 
1.19 Find two isomorphisms between E'^ and M4X4. 
/ 1.20 For what k is M^^xn isomorphic to R''? 

1.21 For what k is isomorphic to E'^? 

1.22 Prove that the map in Example 1.8, from to 7^ given by p(x) i-s> p(x — 1 ), 
is a vector space isomorphism. 

1.23 Why, in Lemma 1.9, must there be a v e V? That is, why must V be nonempty? 

1.24 Are any two trivial spaces isomorphic? 

1.25 In the proof of Lemma 1.10, what about the zero-summands case (that is, if n 
is zero)? 

1.26 Show that any isomorphism f : R' has the form a ka for some nonzero 
real number k. 

/ 1.27 These prove that isomorphism is an equivalence relation. 

(a) Show that the identity map id: V —> V is an isomorphism. Thus, any vector 
space is isomorphic to itself. 
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(b) Show that if f : V ^> W is an isomorphism then so is its inverse f^' : W ^ V. 
Thus, if V is isomorphic to W then also W is isomorphic to V. 

(c) Show that a composition of isomorphisms is an isomorphism: if f : V — > W is 
an isomorphism and g: W — > U is an isomorphism then so also is g o f : V — !> U. 
Thus, if V is isomorphic to W and W is isomorphic to U, then also V is isomorphic 
to U. 

1.28 Suppose that f : V — > W preserves structure. Show that f is one-to-one if and 
only if the unique member of V mapped by f to Ow is Ov- 

1.29 Suppose that f : V — > W is an isomorphism. Prove that the set {vi , . . . ,V]^} C V 
is linearly dependent if and only if the set of images {f (v, ),..., f{vi^) } C W is 
linearly dependent. 

/ 1.30 Show that each type of map from Example 1.7 is an automorphism. 

(a) Dilation ds by a nonzero scalar s. 

(b) Rotation te through an angle 6. 

(c) Reflection fe over a line through the origin. 

Hint. For the second and third items, polar coordinates are useful. 

1.31 Produce an automorphism of other than the identity map, and other than a 
shift map p(x) i-^ p(x — k). 

1.32 (a) Show that a function f : R' R' is an automorphism if and only if it has 
the form x n> kx for some k / 0. 

(b) Let f be an automorphism of R^ such that f{3) = 7. Find f (— 2). 

(c) Show that a function f : R^ — > R^ is an automorphism if and only if it has the 
form 

'x\ /ax + by 



\cx + dy^ 

for some a, b, c, d £ R with ad — be 7^ 0. Hint. Exercises in prior subsections 
have shown that 

I is not a multiple of ( 
dj \c 

if and only if ad — be 7^ 0. 
(d) Let f be an automorphism of R^ with 



Find 



-1, 

1.33 Refer to Lemma 1.9 and Lemma 1.10. Find two more things preserved by 
isomorphism. 

1.34 We show that isomorphisms can be tailored to fit in that, sometimes, given 
vectors in the domain and in the range we can produce an isomorphism associating 
those vectors. 

(a) Let B = (|3i,p2>p3) be a basis for so that any p £ J'2 has a unique 
representation as p = Ci pi + C2P2 + C3P3, which we denote in this way. 

RepB(p) = C2 

Show that the Repg (•) operation is a function from J'2 to R-' (this entails showing 
that with every domain vector v e 72 there is an associated image vector in R^ , 
and further, that with every domain vector v e 72 there is at most one associated 
image vector). 
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(b) Show that this Repg(-) function is one-to-one and onto. 

(c) Show that it preserves structure. 

(d) Produce an isomorphism from J'2 to that fits these specifications. 



X + and 1 - x 1-^ 1 





1.35 Prove that a space is n-dimensional if and only if it is isomorphic to Hint. 
Fix a basis B for the space and consider the map sending a vector over to its 
representation with respect to B. 

1.36 (Requires the subsection on Combining Subspaces, which is optional.) Let 
U and W be vector spaces. Define a new vector space, consisting of the set 
U X W = {{u, w) I u e U and w e W} along with these operations. 

(u,,wO + (u2,W2) = {ui +U2,wi + wi) and r • (u, w) = (ril, riv) 

This is a vector space, the external direct sum of U and W. 

(a) Check that it is a vector space. 

(b) Find a basis for, and the dimension of, the external direct sum T'2 x R^. 

(c) What is the relationship among dim(U), dim(W), and dim(U x W)? 

(d) Suppose that U and W are subspaces of a vector space V such that V = U®W 
(in this case we say that V is the internal direct sum of U and W). Show that 
the map f : U x W ^> V given by 



is an isomorphism. Thus if the internal direct sum is defined then the internal 
and external direct sums are isomorphic. 



1.2 Dimension Characterizes Isomorphism 

In the prior subsection, after stating the definition of an isomorphism, we gave 
some results supporting the intuition that such a map describes spaces as "the 
same." Here we will develop this intuition. When two spaces that are isomorphic 
are not equal, we think of them as almost equal, as equivalent. We shall show 
that the relationship 'is isomorphic to' is an equivalence relation.* 

2.1 Lemma The inverse of an isomorphism is also an isomorphism. 

Proof Suppose that V is isomorphic to W via f : V ^> W. Because an isomor- 
phism is a correspondence, f has an inverse function f^^ : W ^ V that is also a 
correspondence, t 

To finish we will show that because f preserves linear combinations, so also 



f 



u + w 



* More information on equivalence relations and equivalence classes is in the appendix, 
t More information on inverse functions is in the appendix. 
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does f . Let wi — f (vi ) and W2 = ^[vi] 

(Ci • Wi + C2 • W2) = ( Cl • f(vi ) + C2 • f(v2) ) 

= f-'(f(c,vi +C2V2)) 

= CiVi + C2V2 

= Cl •f-l(wi)+C2-f"^(w2) 

since f^^ (wi ) — and f^^ (W2) — V2- With that, by Lemma 1.10 this map 
preserves structure. QED 

2.2 Theorem Isomorphism is an equivalence relation between vector spaces. 

Proof We must prove that the relation is symmetric, reflexive, and transitive. 

To check reflexivity, that any space is isomorphic to itself, consider the 
identity map. It is clearly one-to-one and onto. This calculation shows that it 
also preserves linear combinations. 

id(ci • vi + C2 ■ V2) = CiVi + C2V2 = Cl • id(vi ) + C2 • id(v2) 

Symmetry, that if V is isomorphic to W then also W is isomorphic to V, 
holds by Lemma 2 . 1 since an isomprphism map from V to W is paired with an 
isomorphism from W to V. 

Finally, we must check transitivity, that if V is isomorphic to W and if W is 
isomorphic to U then also V is isomorphic to U. Let f : V W and g: W U 
be isomorphisms and consider the composition g o f : V ^ U. The composition of 
correspondences is a correspondence so we need only check that the composition 
preserves linear combinations. 

gof (c, -V, + C2 •V2) = g(f(Cl -V-i +C2 -vz]) 

= g(ci •f(vi)+C2-f(v2)) 
= c, •g(f(v,))+C2-g(f(v2)) 
= ci • (gof) (vi) + C2 ■ (gof) (v2) 

Thus the composition is an isomorphism. QED 

Therefore, isomorphism partitions the universe of vector spaces into classes. 
Every space is in one and only one isomorphism class. 



All finite dimensional 
vector spaces: 




V = W 



2.3 Theorem Vector spaces are isomorphic if and only if they have the same 
dimension. 
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We've broken the proof into two halves. 

2.4 Lemma If spaces are isomorphic then they have the same dimension. 

Proof We shall show that an isomorphism of two spaces gives a correspon- 
dence between their bases. That is, we shall show that if f: V^- W is an 
isomorphism and a basis for the domain V is B — (pi , . . . , pn). then the image 
set D = (f(|3i ), . . . , f(|3rt]) is a basis for the codomain W. The other half of 
the correspondence — that for any basis of W the inverse image is a basis for 
V — follows from Lemma 2.1, that if f is an isomorphism then f^^ is also an 
isomorphism, and applying the prior sentence to f^'. 

To see that D spans W, fix any w e W, note that f is onto and so there is a 
V e V with w = f(v), and expand v as a combination of basis vectors. 

W = f (V) = f (Vi Pi + ...+Vn|3n) =Vi • f( Pi ) + •••+ • f( Pn ) 

For linear independence of D, if 

Ow = Cif(pi) + ---+Cnf(Pn) =f(CiPi +--- + CnPn) 

then, since f is one-to-one and so the only vector sent to Ow is Oy, we have that 
Ov = Ci pi + • • • + CrtPn, implying that all of the c's are zero. QED 

2.5 Lemma If spaces have the same dimension then they are isomorphic. 

Proof We will prove that any space of dimension n is isomorphic to R^. Then 
we will have that all such spaces are isomorphic to each other by transitivity, 
which was shown in Theorem 2.2. 

Let V be n-dimensional. Fix a basis B = (pi , . . . , pn) for the domain V. 
Consider the operation of representing the members of V with respect to B as a 
function from V to M."- . 



V =ViPi H +VnPn I > 



(It is well-defined* since every v has one and only one such representation - 
see Remark 2.6 below). 

This function is one-to-one because if 

RepB(ui Pi H hUnPn) = Repg (vi Pi H H Vnpn) 

then 









luj 







* More information on well-defined is in the appendix. 
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and so Ui = vi , . . . , = Vrt, implying that the original arguments ui Pi + 

h iinPn and vi Pi H + VrtPn are equal. 

This function is onto; any member of 



w = 



is the image of some v e V, namely w = RepB(wi Pi H 1- WnPn)- 

Finally, this function preserves structure. 

RepB(r-u + s-v) =RepB((rui +svi)Pi H h (rUn + sVn)Pn ) 

/ rui + svi \ 



\rUn + SVn/ 



/Ul\ 




/VI \ 








VuJ 




wJ 



= r • RepB (u] + s • Repg (v) 

Thus, the Rep^ function is an isomorphism and therefore any u-dimensional 
space is isomorphic to M^. QED 

2.6 Remark The parenthetical comment in that proof about the role played by 
the 'one and only one representation' result can do with some amplification. A 
contrasting example, where an association doesn't have this property, will help 
illuminate the issue. Consider this subset of "Pz, which is not a basis. 

A = {1 + Ox + Ox^, + 1x + 0x^,0 + Ox + 1x^,1 + Ix + lx^} 

Call those polynomials cti, 04. If, as in the proof, we try to write the 
members of as p = Ci cti + cioiz + 030^3 + 0404 in order to associate p with 
the 4-tall vector with components ci , . . . , C4 then we have a problem. For, 

consider 'p(x) — 1 +x + x^. Both 

p(x] = 1 cti + 1 a2 + 1 a3 + 004 and p(x) = Occi + Ooiz — 1 013 + 1 a4 
so we are trying to associate p with more than one 4-tall vector 







( '\ 


1 


and 





1 


-1 


\v 




I V 



(of course, p's decomposition is not unique because A is not linearly independent). 
That is, the input p is not associated with a well-defined — i.e., unique — output 
value. 
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In general, any time that we define a function we must check that output 
values are well-defined. In the above proof we must check that for a fixed B each 
vector in the domain is associated by Repg with one and only one vector in the 
CO domain. That check is Exercise 19. 

We say that the isomorphism classes are characterized by dimension because 
we can describe each class simply by giving the number that is the dimension of 
all of the spaces in that class. 

2.7 Corollary A finite-dimensional vector space is isomorphic to one and only one 
of the M^. 



This gives us a collection of representatives of the isomorphism classes. 



All finite dimensional 
vector spaces: 




One representative 
per class 




The proofs above pack many ideas into a small space. Through the rest of 
this chapter we'll consider these ideas again, and fill them out. For a taste of 
this, we will expand here on the proof of Lemma 2.5. 

2.8 Example The space M2x2 of 2x2 matrices is isomorphic to M'*. With this 
basis for the domain 



B 



the isomorphism given in the lemma, the representation map fi = Repj 
the entries over. 

b 
c 

VJ 

One way to think of the map f i is: fix the basis B for the domain and the basis 
£4 for the codomain, and associate (3i with ei , and Pa with 62, etc. Then 
extend this association to all of the members of two spaces. 

^\ 

b 





a(3i +b|32 +c(33 + d^4 



aei + be2 + ces + de4 



c 



We say that the map has been extended linearly from the bases to the spaces. 

We can do the same thing with different bases, for instance, taking this basis 
for the domain. 



A = 
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Associating corresponding members of A and £4 and extending linearly 




= [a/2]d^ + (b/2)a2 + [c/2)&i + (d/2)a4 



(a/2)ei + (b/2)e2 + (c/2)e3 + (d/2)e4 



fa/2\ 
b/2 
c/2 



gives rise to an isomorphism that is different than f 1 . 

The prior map arose by changing the basis for the domain. We can also 
change the basis for the codomain. Starting with 



B and D 


















1 




























w 





1 



associating Pi with 5i , etc., and then linearly extending that correspondence to 
all of the two spaces 




api +b|32 + c|33 + d|34 



aS] + bbz + C63 + d54 



b 
d 

VJ 



gives still another isomorphism. 

We close this section with a summary. Recall that in the first chapter we 
defined two matrices as row equivalent if they can be derived from each other 
by elementary row operations. We showed that is an equivalence relation and so 
the collection of matrices is partitioned into classes, where all the matrices that 
are row equivalent fall together into a single class. Then, for insight into which 
matrices are in each class, we gave representatives for the classes, the reduced 
echelon form matrices. 

In this section we have followed that outline, except that the appropriate no- 
tion of sameness here is vector space isomorphism. First we defined isomorphism, 
saw some examples, and established some properties. As before, we developed 
a list of class representatives to help us understand the partition. It is just a 
classification of spaces by dimension. 

In the second chapter, with the definition of vector spaces, we seemed to 
have opened up our studies to many examples of new structures besides the 
familiar M^'s. We now know that isn't the case. Any finite-dimensional vector 
space is actually "the same" as a real space. We are thus considering exactly the 
structures that we need to consider. 

Exercises 

/ 2.9 Decide if the spaces are isomorphic. 
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(a) R\ (b) 7s, (c) Mzxs, K'^ (d) ^s, M2X3 

(e) M2xk, 

/ 2.10 Consider the isomorphism RepB(-): J'l ^ where B — (1,1 +x). Find the 
image of each of these elements of the domain, 
(a) 3-2x; (b) 2 + 2x; (c) x 
/ 2.11 Show that if m / n then R"^ ^ R'^. 

/ 2.12 Is M„,xn = Mnxm? 

/ 2.13 Are any two planes through the origin in R^ isomorphic? 

2.14 Find a set of equivalence class representatives other than the set of R'^'s. 

2.15 True or false: between any n-dimensional space and R'^ there is exactly one 
isomorphism. 

2.16 Can a vector space be isomorphic to one of its (proper) subspaces? 

/ 2.17 This subsection shows that for any isomorphism, the inverse map is also an 
isomorphism. This subsection also shows that for a fixed basis B of an n-dimensional 
vector space V, the map Repg : V R*^ is an isomorphism. Find the inverse of 
this map. 

/ 2.18 Prove these facts about matrices. 

(a) The row space of a matrix is isomorphic to the column space of its transpose. 

(b) The row space of a matrix is isomorphic to its column space. 

2.19 Show that the function from Theorem 2.3 is well-defined. 

2.20 Is the proof of Theorem 2.3 valid when n — 07 

2.21 For each, decide if it is a set of isomorphism class representatives. 

(a) I keN} 

(b) I ke{-1,0,1,...}} 

(c) {M,nxn I m,n e N} 

2.22 Let f be a correspondence between vector spaces V and VV (that is, a map that 
is one-to-one and onto). Show that the spaces V and W are isomorphic via f if and 
only if there are bases B C V and D C W such that corresponding vectors have the 
same coordinates: Repg(v) = Repn(f{v)). 

2.23 Consider the isomorphism Repg : — > R"*. 

(a) Vectors in a real space are orthogonal if and only if their dot product is zero. 
Give a definition of orthogonality for polynomials. 

(b) The derivative of a member of is in . Give a definition of the derivative 
of a vector in R** . 

/ 2.24 Does every correspondence between bases, when extended to the spaces, give 
an isomorphism? 

2.25 (Requires the subsection on Combining Subspaces, which is optional.) Sup- 
pose that V = Vi ffi V2 and that V is isomorphic to the space U under the map f . 
Show that U = f (V, ) f (U2). 

2.26 Show that this is not a well-defined function from the rational numbers to the 
integers: with each fraction, associate the value of its numerator. 
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II Homomorphisms 

The definition of isomorphism has two conditions. In this section we will consider 
the second one. We will study maps that are required only to preserve structure, 
maps that are not also required to be correspondences. 

Experience shows that these maps are tremendously useful. For one thing 
we shall see in the second subsection below that while isomorphisms describe 
how spaces are the same, we can think of these maps as describing how spaces 
are alike. 



II. 1 Definition 



1.1 Definition A function between vector spaces h: V — t' W that preserves the 
operations of addition 

if vi , V2 G V then h.(vi + vi) = H(vi ) + h.(v2] 

and scalar multiplication 

if V e V and r e M then h.(r • v] = r • h.(v) 

is a homomorphism or linear map. 

1.2 Example The projection map 7t: — >• 

M 



is a homomorphism. It preserves addition 




yi +V2 



Xi +X2 



and scalar multiplication. 




This is not an isomorphism since it is not one-to-one. For instance, both and 
63 in M? map to the zero vector in . 

1.3 Example Of course, the domain and codomain can be other than spaces 
of column vectors. Both of these are homomorphisms; the verifications are 
straightforward. 



174 



Chapter Three. Maps Between Spaces 



(1) fi : T2 ^ 3^3 given by 

ao + aix+a2X'^ i-)- aox+ (ai/2)x^ + (a2/3)x^ 

(2) fa : Mzxi R given by 

1.4 Example Between any two spaces there is a zero homomorphism, mapping 
every vector in the domain to the zero vector in the codomain. 

1.5 Example These two suggest why we use the term 'Unear map'. 
(1) The map g: ^ M given by 

3x + 2y - 4.5z 




is linear, that is, is a homomorphism. In contrast, the map g: IR^ R 
given by 

/ x\ 

3x + 2y - 4.5z + 1 

\ z / 

is not. 

To show that a map is not linear we need only produce a single linear 
combination that the map does not preserve. 

(2) The first of these two maps ti , ti : is linear while the second is 

not. 

x^ 




, Zi 



ti /5x — 2y\ ±2 /5x — 2y 



Finding a linear combination that the second map does not preserve is 
easy. 

The homomorphisms have coordinate functions that are linear combinations of 
the arguments. 

Any isomorphism is a homomorphism, since an isomorphism is a homomor- 
phism that is also a correspondence. So one way to think of 'homomorphism' is 
as a generalization of 'isomorphism' motivated by the observation that many 
of the properties of isomorphisms have only to do with the map's structure 
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preservation property and not to do with being a correspondence. The next 
two results are examples of that thinking. The proof for each given in the prior 
section does not use one-to-one- ness or onto-ness and therefore applies here. 

1.6 Lemma A homomorphism sends a zero vector to a zero vector. 



1.7 Lemma For any map f : V — > W between vector spaces, the following are 
equivalent. 

(1) f is a homomorphism 

(2) f(ci -vi +C2 • V2) = Ci • f (vi ) + C2 • f(v2) for any Ci , C2 e M and vi , V2 G V 

(3) f{ci -vi H + Cn-Vn) = C] • f (vi ] H hCTt-f(vn) for any ci,...,Cn e K 

and vi , . . . , Vn G V 



1.8 Example The function f : 



given by 




is linear since it satisfies item (2). 



/ 


r^{x^/2) 


+ r2(x2/2) 






( ^''^ \ 




( X2/2 \ 























(xi +yi) 


+ T"2(X2 +-1)2 


) 






+ r2 


X2 +y2 


V 




+ T-2(3y2) 


) 




I 3yi ) 




I 3y2 j 



However, some of the things that we have seen for isomorphisms fail to hold 
for homomorphisms in general. One example is the proof of Lemma 1.2.4, which 
shows that an isomorphism between spaces gives a correspondence between 
their bases. Homomorphisms do not give any such correspondence; Example 1.2 
shows this and another example is the zero map between two nontrivial spaces. 
Instead, for homomorphisms a weaker but still very useful result holds. 



1.9 Theorem A homomorphism is determined by its action on a basis: if ((3i , . . . , (3n) 
is a basis of a vector space V and wi , . . . , Wn are elements of a vector space W 
(perhaps not distinct elements) then there exists a homomorphism from V to W 
sending each Pi to Wi, and that homomorphism is unique. 

Proof We will define the map by associating each Pi with Wi and then extending 
linearly to all of the domain. That is, given the input v, we find its coordinates 
with respect to the basis v — Ci pi + • ■ ■ + CnPn and define the associated 
output by using the same Ci coordinates h.(v) = CiWi -!-••• + CnW^. This is a 
well-defined function because, with respect to the basis, the representation of 
each domain vector v is unique. 
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This map is a homomorphism since it preserves linear combinations; where 
Vi = Ci (3i + • • • + CnPn and V2 = di (3i + • • • + dnPn then we have this. 

h.(Tivi +r2V2) = H((rici +r2di)|3i H h (riCn +r2dn)|3n) 

= (ri ci + r2di )wi H + (ti + rzdjwr, 

= rih,(vi) +r2h,(v2) 

This map is unique since if h: V — > W is another homomorphism satisfying 
that h.(Pi) = Wt for each 1, then h, and h, agree on all of the vectors in the 
domain. 

h(v) =h(c,|3i +--- + Cn|3n) =C,h(|3i) + --- + CnM0n) 

= CiW] H h CnWrt = h.(v) 

Thus, h. and h, are the same map. QED 

1.10 Example If we specify a map h: ^ that acts on the standard basis 
£2 in this way 

-(:)-(-:) -(;)-(-:) 

then we have also specified the action of h on any other member of the domain. 
For instance, the value of h on this argument 

Kj)'-<3(:)-(:)'-«0'- 

is a direct consequence of the value of h on the basis vectors. 

So we can construct a homomorphism by selecting a basis for the domain 
and specifying where the map sends those basis vectors. The prior lemma shows 
that we can always extend the action on the map linearly to the entire domain. 
Later in this chapter we shall develop a convenient scheme for computations like 
this one, using matrices. 

Just as the isomorphisms of a space with itself are useful and interesting, so 
too are the homomorphisms of a space with itself. 

1.11 Definition A linear map from a space into itself t: V — )■ V is a linear trans- 
formation. 

1.12 Remark In this book we use 'linear transformation' only in the case where 
the codomain equals the domain but it is often used instead as a synonym for 
' homomorphism ' . 

1.13 Example The map on that projects all vectors down to the x-axis 
is a linear transformation. 
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1.14 Example The derivative map d/dx: 

^ d/dx 2 n — 1 

Qo + aix + ■ • • + a-nX I — > Qi + 2q2X + 3a3X + ■ ■ • + nanX 

is a linear transformation as this result from calculus shows: d(C]f + C2g)/dx — 
Ci (df/dx) + C2 (dg/dx). 

1.15 Example The matrix transpose operation 

is a linear transformation of M2x2- (Transpose is one-to-one and onto and so in 
fact it is an automorphism.) 

We finish this subsection about maps by recalling that we can linearly combine 
maps. For instance, for these maps from to itself 

(:) ^ (3/: J - t) - 

the linear combination 5f — 2g is also a map from to itself. 

x\ 5f-2g / lOx \ 
yj 1^5x-10yJ 

1.16 Lemma For vector spaces V and W, the set of linear functions from V to 
W is itself a vector space, a subspace of the space of all functions from V to W. 

We denote the space of linear maps by £(V, W). 

Proof This set is non-empty because it contains the zero homomorphism. So 
to show that it is a subspace we need only check that it is closed under the 
operations. Let f, g: V W be linear. Then the sum of the two is linear 

(f + g)(civi +C2V2) = f(civi +C2V2) + g(civi +C2V2) 

= Cif(vi) + C2f(v2) +cig(vi) + C2g(v2) 

= c,(f + g)(vi) + C2(f + g)(v2) 
and any scalar multiple of a map is also linear. 

(r-f](c]Vi +C2V2] =r(cif(vi) +C2f(v2)) 

= Ci(r.f)(vi)+C2(r.f)(v2] 

Hence £(V, W] is a subspace. QED 

We started this section by defining homomorphisms as a generalization 
of isomorphisms, isolating the structure preservation property. Some of the 
properties of isomorphisms carried over unchanged while we adapted others. 
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However, if we thereby get an impression that the idea of 'homomorphism' 
is in some way secondary to that of 'isomorphism' then that is mistaken. In the 
rest of this chapter we shall work mostly with homomorphisms. This is partly 
because any statement made about homomorphisms is automatically true about 
isomorphisms but more because, while the isomorphism concept is more natural, 
experience shows that the homomorphism concept is more fruitful and more 
central to further progress. 

Exercises 



/ 1.17 Decide if each h: 
(a) H{ ( y 



is linear. 




(c) h( 




/ 1.19 Show that these two maps are homomorphisms. 

(a) d/dx: J'3 — > given by Qq + ajx + azx^ + a^x^ maps to Qi + 2a2X + SqjX^ 

(b) J: ^2 ^ ^3 given by bo + b,x + b2X^ maps to boX + (b,/2)x^ + {b2/3)x3 
Are these maps inverse to each other? 

1.20 Is (perpendicular) projection from M? to the xz-plane a homomorphism? Pro- 
jection to the yz-plane? To the x-axis? The y-axis? The z-axis? Projection to the 
origin? 

1.21 Show that, while the maps from Example 1.3 preserve linear operations, they 
are not isomorphisms. 

1.22 Is an identity map a linear transformation? 

/ 1.23 Stating that a function is 'linear' is different than stating that its graph is a 
line. 

(a) The function f 1 : R ^> R given by f 1 (x) = 2x — 1 has a graph that is a line. 
Show that it is not a linear function. 

(b) The function f2 : ^ K given by 

c; 

does not have a graph that is a line. Show that it is a linear function. 
/ 1.24 Part of the definition of a linear function is that it respects addition. Does a 
linear function respect subtraction? 
1.25 Assume that h is a linear transformation of V and that (pj , . . . , (3n) is a basis 
of V. Prove each statement, 
(a) If h((3i) =0 for each basis vector then h is the zero map. 
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(b) If h.(pi) — Pi for each basis vector then h is the identity map. 

(c) If there is a scalar r such that h,{(3i) = r- pt for each basis vector then h{v) = r-v 
for all vectors in V. 

/ 1.26 Consider the vector space K+ where vector addition and scalar multiplication 
are not the ones inherited from R but rather are these: a + b is the product of 
a and b, and r • a is the r-th power of a. (This was shown to be a vector space 
in an earlier exercise.) Verify that the natural logarithm map In: ^> R is a 
homomorphism between these two spaces. Is it an isomorphism? 

/ 1.27 Consider this transformation of R^. 



y/3. 



{ 



(xV4) + (y V9) = 1 } 



Find the image under this map of this ellipse. 

/ 1.28 Imagine a rope wound around the earth's equator so that it fits snugly (suppose 
that the earth is a sphere). How much extra rope must we add to raise the circle 
to a constant six feet off the ground? 

/ 1.29 Verify that this map h: R^ ^ R 

/ xN 




is linear. Generalize. 
1.30 Show that every homomorphism from 



to R' acts via multiplication by a 



scalar. Conclude that every nontrivial linear transformation of 
Is that true for transformations of R^? R'^? 
1.31 (a) Show that for any scalars qij , . . . , a„i_n this map h: 
morphism. 

/ cti_,x, + 



V is an isomorphism. 



is a homo- 





( 











■ + Cll ,nXn \ 



X1 H h Qn^.nXn/ 

(b) Show that for each i, the t-th derivative operator dVdx^ is a linear trans- 



formation of Conclude that for any scalars Cjc, 
transformation of that space. 



, Co this map is a linear 



dxk 



Ck-l - 



dx 



Cof 



dxK-i 

1.32 Lemma 1.16 shows that a sum of linear functions is linear and that a scalar 
multiple of a linear function is linear. Show also that a composition of linear 
functions is linear. 

/ 1.33 Where f : V — > W is linear, suppose that f{vi ) = Wj , . . . , f (v^) — for some 
vectors W] , . . . , Wn from W. 

(a) If the set of w 's is independent, must the set of v 's also be independent? 

(b) If the set of v 's is independent, must the set of w 's also be independent? 

(c) If the set of w 's spans W, must the set of v 's span V? 

(d) If the set of v 's spans V, must the set of w 's span VV? 

1.34 Generalize Example 1.15 by proving that for every appropriate domain and 
codomain the matrix transpose map is linear. What are the appropriate domains 
and codomains? 
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1.35 (a) Where u, v e M", by definition the line segment connecting them is the set 
£ = {t-u+(l — t)-v| te [0..1] }. Show that the image, under a homomorphism 
h, of the segment between u and v is the segment between h(u) and h(v] . 
(b) A subset of R'^ is convex if, for any two points in that set, the line segment 
joining them lies entirely in that set. (The inside of a sphere is convex while the 
skin of a sphere is not.) Prove that linear maps from R*^ to R'^^ preserve the 
property of set convexity. 

/ 1.36 Let h: R'^ ^ R"" be a homomorphism. 

(a) Show that the image under h of a line in R'^ is a (possibly degenerate) line in 



(b) What happens to a k-dimensional linear surface? 

1.37 Prove that the restriction of a homomorphism to a subspace of its domain is 
another homomorphism. 

1.38 Assume that h: V ^ W is linear. 

(a) Show that the range space of this map {H(v) | v e V} is a subspace of the 
codomain W. 

(b) Show that the null space of this map {v e V | h(v) = Ow} is a subspace of the 
domain V. 

(c) Show that if U is a subspace of the domain V then its image {H(u) | u e U} is 
a subspace of the codomain W. This generalizes the first item. 

(d) Generalize the second item. 

1.39 Consider the set of isomorphisms from a vector space to itself. Is this a subspace 
of the space £(V, V) of homomorphisms from the space to itself? 

1.40 Does Theorem 1.9 need that ((3] , . . . , pn) is a basis? That is, can we still get a 
well-defined and unique homomorphism if we drop either the condition that the 
set of |3's be linearly independent, or the condition that it span the domain? 

1.41 Let V be a vector space and assume that the maps fi,f2: V^^R^ are lin- 
ear. 

(a) Define a map F: V — > R^ whose component functions are the given linear ones. 



Show that F is linear. 

(b) Does the converse hold — is any linear map from V to R^ made up of two 
linear component maps to R' ? 

(c) Generalize. 



Isomorphisms and homomorphisms both preserve structure. The difference is 
that homomorphisms are subject to fewer restrictions because they needn't 
be onto and needn't be one-to-one. We will examine what can happen with 
homomorphisms that cannot happen with isomorphisms. 

We first consider the effect of not requiring that a homomorphism be onto 
its codomain. Of course, each homomorphism is onto some set, namely its range. 



R" 




11.2 Range space and Null space 
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For example, the injection map l: — > 

(;)-(:) 

is a homomorphism that is not onto. But, l is onto the xy-plane subset of M^. 

2.1 Lemma Under a homomorphism, the image of any subspace of the domain is 
a subspace of the codomain. In particular, the image of the entire space, the 
range of the homomorphism, is a subspace of the codomain. 

Proof Let h: V ^ W be linear and let S be a subspace of the domain V. The 
image h(S) is a subset of the codomain W, which is nonempty because S is 
nonempty. Thus, to show that h(S) is a subspace of W we need only show that 
it is closed under linear combinations of two vectors. If h,(s'i) and h.(s2) are 
members of h.(S) then Ci -11(51 )+C2-h(s2) — h.(ci -si )+h.(c2-S2) — h.(ci •si+C2-S2) 
is also a member of h.(S) because it is the image of Ci • Si + C2 • S2 from S. QED 

2.2 Definition The range space of a homomorphism h: V — > W is 

^(h) ={h(v) I ve V} 

sometimes denoted K(V). The dimension of the range space is the map's rank. 

We shall soon see the connection between the rank of a map and the rank of a 
matrix. 

2.3 Example For the derivative map d/ dx: ^3 5*3 given by qq + Qi x + a2X^ + 
Q3X^ Qi + 2a2X + 3a3X^ the range space ^(d/dx) is the set of quadratic 
polynomials {r + sx + tx'^ | r, s, t G M}. Thus, this map's rank is 3. 

2.4 Example With this homomorphism h: M2x2 -> "^3 

^ I (a + b +2d) + cx^ + cx^ 
V V 

an image vector in the range can have any constant term, must have an x 
coefficient of zero, and must have the same coefficient of x'^ as of x^. That is, 
the range space is ^(h) = {r + sx^ + sx^ | r, s e M} and so the rank is 2. 

The prior result shows that, in passing from the definition of isomorphism to 
the more general definition of homomorphism, omitting the 'onto' requirement 
doesn't make an essential difference. Any homomorphism is onto its range space. 

However, omitting the 'one-to-one' condition does make a difference. A 
homomorphism may have many elements of the domain that map to one element 
of the codomain. Below is a "bean" sketch of a many-to-one map between 
sets.* It shows three elements of the codomain that are each the image of many 
members of the domain. 

* More information on many-to-one maps is in the appendix. 
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Recall that for any function h: V — > W, the set of elements of V that map to 
w G W is the inverse image hr^ (w) = {v e V | h.(v) = w}. Above, the left side 
shows three inverse image sets. 

2.5 Example Consider the projection n: 




which is a homomorphism that is many-to-one. An inverse image set is a vertical 
line of vectors in the domain. 




One example is this. 



2.6 Example This homomorphism h: — > K' 




x + y 



is also many-to-one. For a fixed w e M\ the inverse image h. ^ (w) 




is the set of plane vectors whose components add to w. 

In generalizing from isomorphisms to homomorphisms by dropping the one- 
to-one condition, we lose the property that we've stated intuitively as that 
the domain is "the same" as the range. We lose that the domain corresponds 
perfectly to the range. What we retain, as the examples below illustrate, is that 
a homomorphism describes how the domain is "Uke" or "analogous to" the range. 
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2.7 Example We think of as like except that vectors have an extra 
component. That is, we think of the vector with components x, y, and z as 
somehow hke the vector with components x and y. In defining the projection 
map 7t, we make precise which members of the domain we are thinking of as 
related to which members of the codomain. 

To understanding how the preservation conditions in the definition of ho- 
momorphism show that the domain elements are like the codomain elements, 
we start by picturing as the xy-plane inside of M^. (Of course, is not 
the xy plane inside of since the xy plane is a set of three-tall vectors with 
a third component of zero, but there is a natural correspondence.) Then the 
preservation of addition property says that vectors in act like their shadows 
in the plane. 




Thinking of 7t(v] as the "shadow" of v in the plane gives this restatement: the 
sum of the shadows 7t(vi) + n[v2) equals the shadow of the sum 7t(vi -l-vi). 
Preservation of scalar multiplication is similar. 

Redrawing by showing the codomain on the right gives a picture that is 
uglier but is more faithful to the "bean" sketch. 




Again, the domain vectors that map to Wi lie in a vertical line; the picture 
shows one in gray. Call any member of this inverse image (wi ) a "wi vector." 
Similarly, there is a vertical line of "wi vectors" and a vertical line of "wi + 
W2 vectors." Now, saying that tt is a homomorphism is recognizing that if 
n[vi ) = wi and n[v2] = wi then 7t(vi +V2] — 7t(vi ) + n{v2) = wi + wi. That 
is, the classes add: any wi vector plus any W2 vector equals a wi +W2 vector. 
Scalar multiplication is similar. 

So although and are not isomorphic n describes a way in which they 
are alike: vectors in M.^ add as do the associated vectors in R-^ — vectors add as 
their shadows add. 

2.8 Example A homomorphism can express an analogy between spaces that is 
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more subtle than the prior one. For the map from Example 2.6 




fix two numbers Wi , W2 in the range M. A Vi that maps to Wi has components 
that add to wi , so the inverse image 'hT^ (wi ) is the set of vectors with endpoint 
on the diagonal line x + y —w^. Think of these as "wi vectors." Similarly we 
have "w2 vectors" and "wi + W2 vectors." The addition preservation property 
says this. 



Vl +V2 



a "w] vector" 



plus a "w2 vector" equals a "wi + wi vector" 



Restated, if we add a W] vector to a W2 vector then h maps the result to a 
wi + W2 vector. Briefly, the sum of the images is the image of the sum. Even 
more briefly, h(vi ) + h.(v2) — h.{vi + V2). 

2.9 Example The inverse images can be structures other than lines. For the linear 
map h: 

B - 1) 

the inverse image sets are planes x = 0, x = 1 , etc., perpendicular to the x-axis. 



We won't describe how every homomorphism that we will use is an analogy 
because the formal sense that we make of "alike in that ..." is 'a homomorphism 
exists such that . . . '. Nonetheless, the idea that a homomorphism between two 
spaces expresses how the domain's vectors fall into classes that act like the 
range's vectors is a good way to view homomorphisms. 

Another reason that we won't treat all of the homomorphisms that we see as 
above is that many vector spaces are hard to draw, e.g., a space of polynomials. 
But there is nothing wrong with leveraging those spaces that we can draw. We 
derive two insights from the three examples 2.7, 2.8, and 2.9. 

The first insight is that in all three examples the inverse image of the range's 
zero vector is a line or plane through the origin, a subspace of the domain. 
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2.10 Lemma For any homomorphisin, the inverse image of a subspace of the 
range is a subspace of the domain. In particular, the inverse image of the trivial 
subspace of the range is a subspace of the domain. 

2.11 Remark The examples above consider inverse images of single vectors but 
this result is about inverse images of sets 'hr^ (S] = {v G V | h.(v) G S}. We use 
the same term in both cases by defining the inverse image of a single element 
h,^^ (w) as the inverse image of the one-element set h,^^ ({w})- 

Proof Let h: V W be a homomorphism and let S be a subspace of the 
range space of h. Consider the inverse image of S. It is nonempty because it 
contains Oy, since h.(Ov) — Ow and Ow is an element of S, as S is a subspace. 
To finish we show that it is closed under linear combinations. Let Vi and V2 
be two elements of h^^fS). Then h,(vi ) and h,(v2) are elements of S. That 
implies that CiVi + C2V2 is an element of the inverse image h.^^(S) because 
h(ci vi + C2V2) = Ci h,(vi ) + C2h(v2] is a member of S. QED 

2.12 Definition The null space or kernel of a linear map h: V W is the inverse 
image of Ow- 

^(h) = (Ow) = {v e V I h[v) = Owl 
The dimension of the null space is the map's nullity. 




2.13 Example The map from Example 2.3 has this null space -yVid/dx] — 
{ Qo + Ox + Ox^ + Ox^ I ao e M} so its nullity is 1 . 

2.14 Example The map from Example 2.4 has this null space and nullity 2. 

Now for the second insight from the above pictures. In Example 2.7 each of 
the vertical lines squashes down to a single point — in passing from the domain 
to the range, n takes all of these one-dimensional vertical lines and maps them to 
a point, leaving the range one dimension smaller than the domain. Similarly, in 
Example 2.8 the two-dimensional domain compresses to a one- dimensional range 
by breaking the domain into the diagonal lines and maps each of those to a single 
member of the range. Finally, in Example 2.9 the domain breaks into planes 
which get squashed to a point and so the map starts with a three-dimensional 
domain but ends with a one-dimensional range. (In this third example the 
codomain is two-dimensional but the range of the map is only one-dimensional 
and it is the dimension of the range that matters.) 
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2.15 Theorem A linear map's rank plus its nullity equals the dimension of its 
domain. 

Proof Let h: V — > W be linear and let Bn = ((3i , . . . , (3k) be a basis for 
the null space. Expand that to a basis By — (pi , . . ■ , (3k) (3k+i > • • • > Pn) for 
the entire domain, using Corollary Two. III. 2. 12. We shall show that Br = 
(h(pk+i )) • • • )^(|3n)) is a basis for the range space. With that, counting the 
size of these bases gives the result. 

To see that Br is linearly independent, consider Ow — Ck+i lT-((3k+i ) + ••• + 
CTilT-((3n)- The function is linear so we have Ow = lT-(Ck+i (3k+i + • • • + Cn.(3n) 
and therefore Ck+i |3k+i + ■ ■ ■ + Cn^n is in the null space of h. As Bn is a basis 
for the null space there are scalars Ci , . . . , Ck satisfying this relationship. 



ci|3i +■ 



CkPk = Ck+l |3k+l H h Cn(3Ti 



But this is an equation among the members of By, which is a basis for V, so 
each Ci equals 0. Therefore Br is linearly independent. 

To show that Br spans the range space, consider h.(v) G ^(h) and write v as 
a linear combination v = Ci (3i + • • • + Cn (3n of members of By. This gives h(v) = 
h(ci Pi + • -J + Cn |3nj = ci h( (3 1 ) + • • • + CkH( (3k) + Ck+i H( pk+i ) + • - + cM Pn) 
and since Pi , . . . , (3k are in the null space, we have that h.(v) = + • • • + + 
Ck+i h-(|3k+i ) + •••+ Cnh.((3n). Thus, h.(v) is a linear combination of members 



of Br, and so Br spans the range space. 
2.16 Example Where h: is 



QED 




the range space and null space are 




b 



a,beM} and ^(h.)={ 



ze M} 



and so the rank of h is 2 while the nullity is 1 . 

2.17 Example If t: M ^ M is the linear transformation x — 4x, then the range 
is ^(t) = M' . The rank is 1 and the nullity is 0. 



2.18 Corollary The rank of a linear map is less than or equal to the dimension of 
the domain. Equality holds if and only if the nullity of the map is 0. 



We know that an isomorphism exists between two spaces if and only if the 
dimension of the range equals the dimension of the domain. We have now seen 
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that for a homomorphism to exist a necessary condition is that the dimension of 
the range must be less than or equal to the dimension of the domain. For instance, 
there is no homomorphism from onto M^. There are many homomorphisms 
from into M^, but none onto. 

The range space of a linear map can be of dimension strictly less than the 
dimension of the domain and so linearly independent sets in the domain may 
map to linearly dependent sets in the range. (Example 2.3's derivative transfor- 
mation on CPs has a domain of dimension 4 but a range of dimension 3 and the 
derivative sends { 1 , x, x^, } to {0, 1 , 2x, 3x^ }). That is, under a homomorphism 
independence may be lost. In contrast, dependence stays. 

2.19 Lemma Under a linear map, the image of a linearly dependent set is linearly 
dependent. 

Proof Suppose that Ci vi + • • • + CnVn — Ov with some Ci nonzero. Apply h to 
both sides: h.(ciVi + • • • + CnVn) = Cih(vi ) + ••• + CnH(vri) and h.(Ov) = Ow- 
Thus we have Cih(vi ) + ••• + CnlT.(vn.) = Ow with some Ci nonzero. QED 

When is independence not lost? The obvious sufficient condition is when 
the homomorphism is an isomorphism. This condition is also necessary; see 
Exercise 35. We will finish this subsection comparing homomorphisms with 
isomorphisms by observing that a one-to-one homomorphism is an isomorphism 
from its domain onto its range. 

2.20 Example This one-to-one homomorphism l: — > 

(:)-(■) 

gives a correspondence between M'^ and the xy-plane subset of . 

2.21 Theorem In an n-dimensional vector space V, these are equivalent statements 
about a linear map h: V — > W. 

(1) h. is one-to-one 

(2) h has an inverse from its range to its domain that is linear 

(3) ^(h) = {0 }, that is, nullity (h) = 

(4) rank(h.) = n 

(5) if (Pi,...,|3n) is a basis for V then (h.(|3i ),..., h(0n)) is a basis for ^(h.) 

Proof We will first show that (1) ■^==> (2). We will then show that (1) =^ 
(3) ^ (4) ^ (5) ^ (2). 

For (1) (2), suppose that the linear map h is one-to-one and so has an 
inverse h.^^ : M[h] — V. The domain of that inverse is the range of h and thus 
a linear combination of two members of it has the form Ci h.(vi ) + C2H(v2). On 
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that combination, the inverse h ^ gives this. 

h.-\c^h(v^) + C2h[v2)) ^h-\h[c^v^ +C2V2)) 
— h^^ o h (ci Vi + C2V2] 

= CiVi + C2V2 

= Ci •h-^h(vi))+C2-H-^h(v2)) 

Thus if a hnear map has an inverse, then the inverse must be hnear. But this also 
gives the (2) (1) implication, because the inverse itself must be one-to-one. 

Of the remaining implications, (1) (3) holds because any homomorphism 
maps Ov to Ow. but a one-to-one map sends at most one member of V to Ow 

Next, (3) (4) is true since rank plus nullity equals the dimension of the 
domain. 

For (4) (5), to show that (h.(|3i ),..., h.((3n)) is a basis for the range 

space we need only show that it is a spanning set, because by assumption 
the range has dimension n. Consider h(v) e ^(h). Expressing v as a linear 
combination of basis elements produces h.(v) = h(ci Pi + C2P2 + • • • + CnPn), 
which gives that h(v) = Ci h.((3i ) + • • • + Cnh.((3n), as desired. 

Finally, for the (5) (2) implication, assume that (pi , . . . , pn) is a basis 
for V so that (h(|3i ),..., h.(|3n)) is a basis for ^(h). Then every w e .^(h.) has 
the unique representation w = Ci h.((3i ) + • • • + Cnh.(|3n)- Define a map from 
^(h) to V by 

W ^ Ci Pi + C2(32 + • ■ ■ + CnPn 

(uniqueness of the representation makes this well-defined). Checking that it is 
linear and that it is the inverse of h are easy. QED 

We have now seen that a linear map expresses how the structure of the 
domain is like that of the range. We can think of such a map as organizing the 
domain space into inverse images of points in the range. In the special case that 
the map is one-to-one, each inverse image is a single point and the map is an 
isomorphism between the domain and the range. 

Exercises 

/ 2.22 Let h: CP3 — > J'4 be given by p(x) n> x ■ p{x). Which of these are in the null 
space? Which are in the range space? 

(a) (b) (c) 7 (d) 12x- 0.5x3 1 + Sx^ - 

/ 2.23 Find the null space, nullity, range space, and rank of each map. 
(a) h: ]R2 ^ ?3 given by 

.by 



- ax + ax^ 



(b) h: M2x2 K given by 

(c) h: M2x2 J'2 given by 



a b 
c d 



a b 
c d 



dx^ 
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(d) the zero map Z: ^ ^4 

/ 2.24 Find the nullity of each map. 

(a) h: j^s j.^^^-^ f^.^^ y^. ^ ^^j^j^ 

(c) h: R'' ^> R^, an onto map (d) h: M3X3 7^3x3, onto 
/ 2.25 What is the null space of the differentiation transformation d/dx: — > CP^? 
What is the null space of the second derivative, as a transformation of Jn? The 
k-th derivative? 

2.26 Example 2.7 restates the first condition in the definition of homomorphism as 
'the shadow of a sum is the sum of the shadows'. Restate the second condition in 
the same style. 

2.27 For the homomorphism h: — > given by h(ao + Qix + 02 + asx^) = 
O-o + (ufl + Q] )x + (a2 + Q3)x^ find these. 

(a) ^(h) (b) h-'(2-x3) (c)h-'(l+x2) 
/ 2.28 For the map f : R^ ^ R given by 

sketch these inverse image sets: f^^ (—3), f^' (0), and f^^ (1). 
/ 2.29 Each of these transformations of is one-to-one. Find the inverse of each. 

(a) Uo + ai X + a2X^ + Usx^ ao + ui x + 2q2X^ + Susx^ 

(b) ao + Q] X + Q2X^ + a^x^ Uq + a2X + Qj x^ + Q3X^ 

(c) Uo + QiX + a2X^ + 0.3%^ 1-^ Q] + a2X + UsX^ + QqX^ 

(d) ao + aix+Q2X^ + a3X^ 1-^ ao + (ao + ai )x+(ao + ai +a2)x^ + (ao + a, +a2 + a3)x^ 

2.30 Describe the null space and range space of a transformation given by v i-^ 2v. 

2.31 List all pairs (rank(h), nullity (h)) that are possible for linear maps from R^ to 

R\ 

2.32 Does the differentiation map d/dx: 9^ J'n have an inverse? 
/ 2.33 Find the nullity of the map h: 7^ — > R given by 

QO + ai X H + QnX*^ 1-^ Qo + ai X H + UnX'^ dx. 

Jx=0 

2.34 (a) Prove that a homomorphism is onto if and only if its rank equals the 
dimension of its codomain. 

(b) Conclude that a homomorphism between vector spaces with the same dimen- 
sion is one-to-one if and only if it is onto. 

2.35 Show that a linear map is one-to-one if and only if it preserves linear indepen- 
dence. 

2.36 Corollary 2.18 says that for there to be an onto homomorphism from a vector 
space V to a vector space W, it is necessary that the dimension of W be less 
than or equal to the dimension of V. Prove that this condition is also sufficient; 
use Theorem 1.9 to show that if the dimension of W is less than or equal to the 
dimension of V, then there is a homomorphism from V to W that is onto. 

/ 2.37 Recall that the null space is a subset of the domain and the range space is a 
subset of the codomain. Are they necessarily distinct? Is there a homomorphism 
that has a nontrivial intersection of its null space and its range space? 
2.38 Prove that the image of a span equals the span of the images. That is, where 
H: V ^ W is linear, prove that if S is a subset of V then h([S]) equals [h(S]]. This 
generalizes Lemma 2.1 since it shows that if U is any subspace of V then its image 
{h(u) I u e U} is a subspace of W, because the span of the set U is U. 
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/ 2.39 (a) Prove that for any linear map h: V 
has the form 



W and any w e W, the set h 



{v + ft I ft e ■yy{h)} 

for V e V with h.(v) = w (if h is not onto then this set may be empty). Such a 
set is a coset of jViK] and we denote it as v + ^/K(h). 
(b) Consider the map t: ^ given by 



ax 

cx - 



for some scalars a, b, c, and d. Prove that t is hnear. 

(c) Conclude from the prior two items that for any linear system of the form 

Qx + by = e 
cx + dy = f 

we can write the solution set (the vectors are members of E^) 

{p + h I h satisfies the associated homogeneous system} 
where p is a particular solution of that linear system (if there is no particular 
solution then the above set is empty). 

(d) Show that this map h: E'^ ^ E"^ is linear 



yXn/ \ani,lXl H hUm.nXn/ 

for any scalars aij , . . . , a,,,^,^. Extend the conclusion made in the prior item, 
(e) Show that the k-th derivative map is a linear transformation of for each k. 
Prove that this map is a linear transformation of that space 

„ d''-! . d 



- Ck-l - 



-f 



+ c,— f- 
dx 



dxK-i 

Draw a conclusion as above 



Cof 



V that is rank one, the map given by 
► V satisfies t o t = r ■ t for some real 



for any scalars Ci<, . . . , Co 

2.40 Prove that for any transformation t: V - 
composing the operator with itself t o t: V 
number r. 

2.41 Let h: V — !> E be a homomorphism, but not the zero homomorphism. Prove 
that if ((3] , . . . , pn) is a basis for the null space and if v e V is not in the null space 
then (v, (3] , . . . , pn) is a basis for the entire domain V. 

2.42 Show that for any space V of dimension n, the dual space 

£(V,E) = {h: E I h is linear} 
is isomorphic to E'^. It is often denoted V*. Conclude that V* = V. 

2.43 Show that any linear map is the sum of maps of rank one. 

2.44 Is 'is homomorphic to' an equivalence relation? [Hint: the difficulty is to decide 
on an appropriate meaning for the quoted phrase.) 

2.45 Show that the range spaces and null spaces of powers of linear maps t: V ^> V 
form descending 

VD.'^{t) D^(t2) D ... 

and ascending 

{0}c ^(t) c .y/(t^) c ... 

chains. Also show that if k is such that !%[\^) — ^(t''+' ) then all following range 
spaces are equal: = .■^(t^+^ ) = .-f (t''+2) .... Similarly, if ^(t^) = ^(t''+' ) 

then ^(t") = ^(t''+' ) = ^(t^+^] = .... 
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III Computing Linear Maps 

The prior section shows that a linear map is determined by its action on a basis. 
The equation 

h.(v) = h(Ci • Pi + • • • + Cn • Pn) = Ct H(|3i ) + ••• + Cn • MPn) 

describes how we get the value of the map on any vector v by starting from the 
value of the map on the vectors Pi, in a basis and extending linearly. 

This section gives a convenient scheme to use the representations of h(|3i ), 
. . . , h.(pTi) to compute, from the representation of a vector in the domain 
RepB(v), the representation of that vector's image in the codomain Rep^ (h.(v] ) . 



Ill.l Representing Linear Maps with Matrices 

1.1 Example For the spaces and fix 



B = ( 












and D = ( 


(•)■ 
















as the bases. Consider the map H: 



'2' 
,0, 




To compute the action of this map on any vector at all from the domain we first 
express, with respect to the codomain's basis, h.((3i ) 




so RepD(h(Pi)) 



/ 0\ 

-1/2 



/1\ 

2 





so RepD(h(P2)) 
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Then for any member v of the domain we can compute h.(v) using the h.(|3t)'s. 
h{y) = h{c^ 



'1 



+ 1 I |) + C2- (1 I 
,0 












1-' 




•• 













/ 0^ 

-2 I + (1ci +OC2) • 



0, 





vv 



then Repo ( h.(v) ) — 



I Oci+lci \ 

-(V2)Ci -1C2 
y 1Ci+0C2 / 



since Repg ( I 



we have Repp ( h.( I 



I 2' 

-5/2 



We express computations Uke the one above with a matrix notation. 





-1/2 
1 



Cl 

S2, 



B,D 



^ 0Cl+1C2 
(-1/2)Cl -1C2 

Ici -I-OC2 



In the middle is the argument v to the map, represented with respect to the 
domain's basis B by the column vector with components ci and 02- On the 
right is the value of the map on that argument h.(v), represented with respect to 
the codomain's basis D. The matrix on the left is the new thing. We will use it 
to represent the map and we will think of the above equation as representing an 
application of the map to the matrix. 

That matrix consists of the coefficients from the vector on the right, and 
1 from the first row, —1 /2 and —1 from the second row, and 1 and from the 
third row. That is, we make it by adjoining the vectors representing the h,(Pi)'s. 



RepD(H(Pi)) 



RePD(M|32) 
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1.2 Definition Suppose that V and W are vector spaces of dimensions n and m 
with bases B and D, and that h: V — > W is a linear map. If 



RepD(M(3i)) = 



H2,l 



RepD(M|3n)) = 



then 



RePB,D(^) 



1,1 



Hi ,2 
H2,2 

Hm,2 



Hl,n \ 
lT.2,n 



B,D 



is the matrix representation o/h with respect to B,D. 



In that matrix the number of columns n is the dimension of the map's domain 
while the number of rows m is the dimension of the codomain. 

We use lower case letters for a map, upper case for the matrix, and lower case 
again for the entries of the matrix. Thus for the map h, the matrix representing 
it is H, with entries htj. 

1.3 Example If h: M 



(2ai + a2) + (-a3)x 




then where 




and D 



(1 +x) 



the action of h on B is this. 



A simple calculation 
Repo {~x) = 



-1/2^ 
"1/2, 



RepD(4) = 



RepD(2) = 

\ / D \ / 

shows that this is the matrix representing h. with respect to the bases. 
R-epB,D(H) = 



-1/2 
-1/2 



2' 



B,D 
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1.4 Theorem Assume that V and W are vector spaces of dimensions n and vn 
with bases B and D, and that h: V — > W is a linear map. If h is represented by 



R-epB,D(lT-) = 



and V e V is represented by 



H2,l h.2,2 



RepB (v) 



C2 



h2,n 



then the representation of the image of v is this. 

/ Hi jCl + hi,2C2 H hh-i^nCn \ 

1t.2,iCi +H2,2C2 H l-h.2 



Repo ( H(v] 



\hm,l Cl + h.TT^,2C2 H \- Hm.nCn / 



Proof This formalizes Example 1.1. See Exercise 29. 



QED 



1.5 Definition The matrix-vector product of a mxn matrix and a nx 1 vector 
is this. 



a2,i a2,2 




a2,ici 



Ql,nCn \ 
a2,rtCn 



Briefly, application of a linear map is represented by the matrix-vector 
product of the map's representative and the vector's representative. 

1.6 Remark In some sense Theorem 1.4 is not at all surprising because we chose 
the matrix representative in Definition 1.2 precisely to make Theorem 1.4 true. 
If the theorem were not true then we would adjust the definition. Nonetheless, 
we need the verification that the definition is right. 

1.7 Example For the matrix from Example 1.3 we can calculate where that map 
sends this vector. 
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With respect to the domain basis B the representation of this vector is 



0' 



RepB(v] = I 1/2 



and so the matrix- vector product gives the representation of the value h(v) with 
respect to the codomain basis D. 



RepD(Mv)) = 



-1/2 1 
-1/2 -1 -2 



(-1/2) -0 + 1 • (1/2)+2-2^ 
(-1/2).0-1 •(1/2)-2.2, 




' 9/2' 
,-9/2, 



To find h(v) itself, not its representation, take (9/2)(l +x) - (9/2)(-l +x) = 9. 



1.8 Example Let n: 



be projection onto the xy-plane. To give a matrix 



representing this map, we first fix some bases. 




D = 



For each vector in the domain's basis, we find its image under the map. 




Then we find the representation of each image with respect to the codomain's 
basis. 



Rep]; 



Repof 



Rep]; 



Finally, adjoining these representations gives the matrix representing tc with 
respect to B, D. 

] : -!) 

^ / B,D 



Rep 



B,D 



7t 



We can illustrate Theorem 1.4 by computing the matrix- vector product repre- 
senting the following statement about the projection map. 
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Representing this vector from the domain with respect to the domain's basis 

Repgi 

gives this matrix- vector product. 




Repo ( 7t( M 




B,D \ 1 




Expanding this representation into a linear combination of vectors from D 

0. 




checks that the map's action is indeed reflected in the operation of the matrix. 
(We will sometimes compress these three displayed equations into one 




in the course of a calculation.) 

We now have two ways to compute the effect of projection, the straightfor- 
ward formula that drops each three-tall vector's third component to make a 
two-tall vector, and the above formula that uses representations and matrix- 
vector multiplication. Compared to the first way, the second way might seem 
complicated. However, it has advantages. The next example shows that this 
new scheme simplifies the formula for some maps. 



1.9 Example To represent a rotation map te : R' 
the plane counterclockwise through an angle 9 




M-^ that turns all vectors in 



we start by fixing bases. Using Ez both as a domain basis and as a codomain 
basis is natural. Now, we find the image under the map of each vector in the 
domain's basis. 





Then we represent these images with respect to the codomain's basis. Because 
this basis is £2, vectors represent themselves. Adjoining the representations 
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gives the matrix representing the map. 



Rep 



£2, £2 



cos U — sm t) 
sin 6 cos 



The advantage of this scheme is that by knowing how to represent the image of 
just the two basis vectors we get a formula for the image of any vector at all; 
here we rotate a vector by 9 = tc/6. 




V3/2 -^/2\ ( 3 
1/2 V3/2] 1-2 




(We are again using the fact that with respect to the standard basis, vectors 
represent themselves.) 

1.10 Example In the definition of matrix-vector product the width of the matrix 
equals the height of the vector. Hence, the first product below is defined while 
the second is not. 





One reason that this product is not defined is the purely formal one that the 
definition requires that the sizes match and these sizes don't match. Behind 
the formality, though, is a sensible reason to leave it undefined: the three-wide 
matrix represents a map with a three-dimensional domain while the two-tall 
vector represents a member of a two-dimensional space. 

Earlier we saw the operations of addition and scalar multiplication operations 
of matrices and the dot product of vectors. Matrix-vector multiplication is a new 
operation in the arithmetic of vectors and matrices. Nothing in Definition 1.5 
requires us to view it in terms of representations. We can get some insight by 
focusing on how the entries combine. 

A good way to view matrix-vector product is as the dot products of the rows 
of the matrix with the column vector. 



C2 



Qij Cl + ai,2C2 + . . . + ai,nCn 



V 



Looked at in this row-by-row way, this new operation generalizes dot product. 
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We can also view the operation column-by-column 
• Hi,n \ /ci\ 



/ 1^1,1 
It.2,1 



Hi ,2 

It.2,2 



/ KijCl +h.i,2C2 + 
H2,lCl +h,2,2C2 + 



+ lT-2,nCn 



h-m,n / \Cn/ \lT-m,l Cl + Hm,2C2 H + ^m,nCn / 



= Cl 



H2,l 



+ ■ 



1.11 Example 





The result has the columns of the matrix weighted by the entries of the vector. 
This way of looking at it brings us back to the objective stated at the start of 
this section, to compute h.(ci (3i + • • • + CnPn] as cih(|3i ) + ••• + Cnh.(|3rt). 

We began this section by noting that the equality of these two enables us to 
compute the action of h on any argument knowing only h.((3i ), . . . , h(|3rt). We 
have developed this into a scheme to compute the action of the map by taking 
the matrix-vector product of the matrix representing the map with the vector 
representing the argument. In this way, with respect to any bases, any linear 
map has a matrix representing it. The next subsection will show the converse, 
that if we fix bases then for any matrix there is an associated linear map. 

Exercises 

/ 1.12 Multiply the matrix 

(o J 2) 

\1 1 0/ 



by each vector (or state "not defined"). 

"'0 "'(3 

1.13 Perform, if possible, each matrix-vector multiplication 

(a) c (b) ' ' 



1 

-2 




(c) 




/ 1.14 Solve this matrix equation. 

(I : 

\^ -1 
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/ 1.15 For a homomorphism from 72 to that sends 

1 i-7> 1 + X, X i-> 1 + 2x, and x^ (->• x 
where does 1 — 3x + 2x^ go? 
/ 1.16 Assume that h: ^ is determined by this action. 

Using the standard bases, find 

(a) the matrix representing this map; 

(b) a general formula for h(v) . 

/ 1.17 Let d/dx: 7^ — >■ 7^ be the derivative transformation. 

(a) Represent d/dx with respect to B,B where B = (l,x, x^,x^). 

(b) Represent d/dx with respect to B,D where D — (1 , 2x, 3x^,4x^). 
/ 1.18 Represent each linear map with respect to each pair of bases. 

(a) d/dx: 7^ CP^ with respect to B, B where B = (1 , x, . . . , x'^), given by 

uo + aix + Q2X^ + ■ ■ • + Qn"'^ n> Q) + 2a2X + ■ ■ • + nanx""^' 

(b) J: J'n+1 with respect to Bn, Bn+1 where B^ = (1 , x, x^), given by 

QO + Q, X + Q2X^ + • • • + QnX'' l-^ QqX + -^X^ + • • • + _^x''+1 

2 n + 1 

(c) : K with respect to B, £i where B — (1 , x, . . . , x'^) and £] = (1 ), given 
by 

2 n Ctl In 

ao + aix + Q2X H h UnX i-^ uo + — H \ 

2 n + 1 

(d) evals: — > R with respect to B,£i where B = (1 , x, . . . , x'^) and £] = (1), 
given by 

Ufl + Q] X + a2X^ H h QnX'^ l-^ Qfl + Ui • 3 + Q2 • 3^ H + Qn • 3^^ 

(e) slide_i : 7^ ^ with respect to B, B where B = (1 , x, . . . , x*^), given by 

Uo + ai X + a2X^ H h UnX" l-^ Qq + Qi • (x + 1 ) H h Qn • (x + 1 

1.19 Represent the identity map on any nontrivial space with respect to B, B, where 
B is any basis. 

1.20 Represent, with respect to the natural basis, the transpose transformation on 
the space M.2x2 of 2 x 2 matrices. 

1.21 Assume that B — (Pi , p2, Ps, P4) is a basis for a vector space. Represent with 
respect to B, B the transformation that is determined by each. 

(a) |3i ^ P2, P2 ^ 133, P3 ^ P4, P4 ^ 

(b) |3i ^ P2, P2 ^ 0, Ps P4, P4 ^ 

(c) Pi ^ P2, P2 ^ P3, P3 ^ 0, P4 ^ 

1.22 Example 1.9 shows how to represent the rotation transformation of the plane 
with respect to the standard basis. Express these other transformations also with 
respect to the standard basis. 

(a) the dilation map ds, which multiplies all vectors by the same scalar s 

(b) the reflection map ff, which reflects all all vectors across a line £ through the 
origin 

/ 1.23 Consider a linear transformation of R^ determined by these two. 



(a) Represent this transformation with respect to the standard bases. 
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(b) Where does the transformation send this vector? 

(c) Represent this transformation with respect to these bases. 



(d) Using B from the prior item, represent the transformation with respect to 
B,B. 

1.24 Suppose that h: V ^> W is one-to-one so that by Theorem 2.21, for any basis B — 
(P,,..., Pn) C V the image h{B) = (h((3i ),..., h(Pn)) is a basis for W. 

(a) Represent the map h with respect to B,h(B). 

(b) For a member v of the domain, where the representation of v has components 
C] , . . . , On, represent the image vector h(v] with respect to the image basis h(B). 

1.25 Give a formula for the product of a matrix and e^, the column vector that is 
all zeroes except for a single one in the i-th position. 

/ 1.26 For each vector space of functions of one real variable, represent the derivative 
transformation with respect to B,B. 

(a) { Q cos X + b sin x|Q,beR}, B = (cos x, sin x) 

(b) {ae" + be2'' | a,b e R}, B = (e'',e2'') 

(c) {a + bx + ce" + dxe" | a,b,c, d e R}, B = (l,x, e'',xe'') 

1.27 Find the range of the linear transformation of represented with respect to 
the standard bases by each matrix. 

(a) I ! 1) (b) \) (c) a matrix of the form | }' 

/ 1.28 Can one matrix represent two different linear maps? That is, can Repg o(^) = 

1.29 Prove Theorem 1.4. 
/ 1.30 Example 1.9 shows how to represent rotation of all vectors in the plane through 
an angle 9 about the origin, with respect to the standard bases. 

(a) Rotation of all vectors in three-space through an angle 9 about the x-axis is a 
transformation of M? . Represent it with respect to the standard bases. Arrange 
the rotation so that to someone whose feet are at the origin and whose head is 
at (1,0,0), the movement appears clockwise. 

(b) Repeat the prior item, only rotate about the y-axis instead. (Put the person's 
head at £2.) 

(c) Repeat, about the z-axis. 

(d) Extend the prior item to . [Hint: we can restate 'rotate about the z-axis' 
as 'rotate parallel to the xy-plane'.) 

1.31 (Schur's Triangularization Lemma) 

(a) Let U be a subspace of V and fix bases Bu C By What is the relationship 
between the representation of a vector from U with respect to Bu and the 
representation of that vector (viewed as a member of V) with respect to By? 

(b) What about maps? 

(c) Fix a basis B = (p, , . . . , pn) for V and observe that the spans 

[{0}]={0}c KPi}] c [{Pi,p2}] c ••• c[B]=V 
form a strictly increasing chain of subspaces. Show that for any linear map 
h: V ^ W there is a chain Wq = {0} C W, C • • • C W^^ = W of subspaces of W 
such that 

M[{Pr,---,Pi}]) C Wt 
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for each i. 

(d) Conclude that for every linear map h: V W there are bases B, D so the 
matrix representing h with respect to B, D is upper-triangular (that is, each 
entry hi^j with i > j is zero) . 

(e) Is an upper-triangular representation unique? 



III. 2 Any Matrix Represents a Linear Map 



The prior subsection shows that the action of a linear map h. is described by a 
matrix H, with respect to appropriate bases, in this way. 



h 
H 



( Hi,iVi -I hh,i,nVn \ 



(*) 



Here we will show the converse, that each matrix represents a linear map. 
So we start with a matrix 



H = 



/ Hi,i Ki,2 
H2,l H2,2 



m,2 



and we will describe how it defines a map h. We require that the map be 
represented by the matrix so first note that in (*) the dimension of the map's 
domain is the number of columns n of the matrix and the dimension of the 
codomain is the number of rows m. Thus, for h's domain fix an n-dimensional 
vector space V and for the codomain fix an m-dimensional space W. Also fix 
bases B = (Pi , . . . , Pri) and D = (61 , . . . , 6^,) for those spaces. 

Now let h: V — >^ W be: where v in the domain has the representation 



RepB(v) 



\Vny 



then its image h,(v) is the member the codomain with this representation. 



Repo ( H(v) 



/ HljVl H h Hl,^^Vr 



\lT.m,lVl H h 



That is, to compute the action of h. on any v G V, first express v with respect to 

the basis v = vi3i -I hVnPn and then h(v) = (HijVi -I +Hi,nVn) • 5i + 

1- (Hm,lVl H h 
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Above we have made some arbitrary choices, for instance V can be any 
n-dimensional space and B could be any basis for V, so H does not define a 
unique function. However, note also that once we have fixed V, B, W, and D 
then h. is well-defined* since v has a unique representation with respect to the 
basis B by Theorem II. 1.12 and the calculation of w from its representation is 
also uniquely determined. 

2.1 Example Consider this matrix. 

/I 2^ 

H= 3 4 

It is 3 X 2 so any map that it defines must carry a dimension 2 domain to a 
dimension 3 codomain. Let the domain and codomain be and J'2, with these 
bases. 

B = (|^]^,(^J^^) D = (x^x2+x,x2+x + 1) 

Let h: — > be the function defined by H and we will compute the image 
under K of this member of the domain. 



V 



We have 

Repo (h.(v)] = H • Repg (v) = 3 4 

\5 6 




/-n/2^ 

-23/2 
V-35/2y 



Prom its representation computation of w is routine (—1 1 /2)(x^) — (23/2)(x^ + 
x) - (35/2) (x^ + X + 1 ) = (-69/2)x2 - (58/2)x - (35/2) . 

2.2 Theorem Any matrix represents a homomorphism between vector spaces of 
appropriate dimensions, with respect to any pair of bases. 

Proof We must check that for any matrix H and any domain and codomain 
bases B, D, the defined map h. is linear. If v, u e V are such that 

(vi\ /ur 
• and RepB(u) = 

and c, d e M then the calculation 

h.(cv + du) = (h-i J (cv] + du] ) H h h.i^ri(cvn + du^)) • 61 + 

K (h.m,i (cvi + dui ) H h m 

= c • h,(v) + d • h.(u) 

supplies that check. QED 

* More information on well-defined is in the appendix. 
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2.3 Example Even if the domain and codomain are the same, the map that the 
matrix represents depends on the bases that we choose. If 

^ represented by H with respect to Bi , Di maps 

while h2 : represented by H with respect to B2, D2 is this map. 







D2 



These are different functions. The first is projection onto the x-axis, while the 
second is projection onto the ij-axis. 

This result means that we can, when convenient, work solely with matrices, 
just doing the computations without having to worry whether a matrix of interest 
represents a linear map on some pair of spaces. When we are working with a 
matrix but we do not have particular spaces or bases in mind then we often 
take the domain and codomain to be and M"^ and use the standard bases. 
This is convenient because with the standard bases vector representation is 
transparent — the representation of v is v. (In this case the column space of the 
matrix equals the range of the map and consequently the column space of H is 
often denoted by ^(H).) 

We finish this section by illustrating how a matrix can give us information 
about the associated maps. 

2.4 Theorem The rank of a matrix equals the rank of any map that it represents. 

Proof Suppose that the matrix H is mxn. Fix domain and codomain spaces 
V and W of dimension n and m with bases B — (pi , . . . , pn) and D. Then H 
represents some linear map h. between those spaces with respect to these bases 
whose range space 

{h(v) I V e V} = {h(Ci Pi + • • • + CnPn) | Cl , . . . , Cn G M} 

= {Cih(|3i) + --- + CnH(|3n] I Ci,...,Cn e M} 

is the span [{h.(|3i ),..., h-lpn] }]. The rank of the map h. is the dimension of this 
range space. 

The rank of the matrix is the dimension of its column space, the span of the 
set of its columns [{Rep^ (h(|3i )),... , Rep^ (h,((3n)) }]. 

To see that the two spans have the same dimension, recall from the proof 
of Lemma 1.2.5 that if we fix a basis then representation with respect to that 
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basis gives an isomorphism Repp : W — > M."^. Under this isomorphism there is a 
linear relationship among members of the range space if and only if the same 
relationship holds in the column space, e.g, = Ci • h(|3i ) + •••+ Cn • h-dSn) if 
and only if = Ci • Repn(h((3i )) + ••• + Cn ■ Repi3(h(pri))- Hence, a subset of 
the range space is linearly independent if and only if the corresponding subset 
of the column space is linearly independent. Therefore the size of the largest 
linearly independent subset of the range space equals the size of the largest 
linearly independent subset of the column space, and so the two spaces have the 
same dimension. QED 

2.5 Example Any map represented by 



/I 


2 


2\ 


1 


2 


1 








3 







y 



must be from a three-dimensional domain to a four-dimensional codomain. In 
addition, because the rank of this matrix is two (we can spot this by eye or get it 
with Gauss's Method), any map represented by this matrix has a two-dimensional 
range space. 

2.6 Corollary Let h. be a linear map represented by a matrix H. Then h is onto 
if and only if the rank of H equals the number of its rows, and h, is one-to-one if 
and only if the rank of H equals the number of its columns. 

Proof For the onto half, the dimension of the range space of h is the rank 
of h, which equals the rank of H by the theorem. Since the dimension of the 
codomain of h, equals the number of rows in H, if the rank of H equals the 
number of rows then the dimension of the range space equals the dimension 
of the codomain. But a subspace with the same dimension as its superspace 
must equal that superspace (because any basis for the range space is a linearly 
independent subset of the codomain whose size is equal to the dimension of the 
codomain, and thus so this basis for the range space must also be a basis for the 
codomain). 

For the other half, a linear map is one-to-one if and only if it is an isomorphism 
between its domain and its range, that is, if and only if its domain has the same 
dimension as its range. But the number of columns in h. is the dimension of h's 
domain, and by the theorem the rank of H equals the dimension of h's range. 
QED 

The above results settle the apparent ambiguity in our use of the same word 
'rank' to apply both to matrices and to maps. 

2.7 Definition A linear map that is one-to-one and onto is nonsingular, otherwise 
it is singular. That is, a linear map is nonsingular if and only if it is an 
isomorphism. 
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2.8 Remark Some authors use 'nonsingular' as a synonym for one-to-one while 
others use it the way that we have here. The difference is slight because any 
map is onto its range space, so a one-to-one map is an isomorphism with its 
range. 

In the first chapter we defined a matrix to be nonsingular if it is square and 
is the matrix of coefficients of a linear system with a unique solution. The next 
result justifies our dual use of the term. 

2.9 Lemma A nonsingular linear map is represented by a square matrix. A 
square matrix represents nonsingular maps if and only if it is a nonsingular 
matrix. Thus, a matrix represents isomorphisms if and only if it is square and 
nonsingular. 

Proof Assume that the map h: V W is nonsingular. Corollary 2.6 says that 
for any matrix H representing that map, because h. is onto the number of rows 
of H equals the rank of H and because h is one-to-one the number of columns 
of H is also equal to the rank of H. Thus H is square. 

Next assume that H is square, n x n. The matrix H is nonsingular if 
and only if its row rank is n, which is true if and only if H's rank is n by 
Theorem Two. 111. 3. 11, which is true if and only if h's rank is n by Theorem 2.4, 
which is true if and only if h is an isomorphism by Theorem 1.2.3. (The last 
holds because the domain of h. is n-dimensional as it is the number of columns 
in H.) QED 

2.10 Example Any map from to represented with respect to any pair of 
bases by 



is singular because this matrix is singular. 

We've now seen that the relationship between maps and matrices goes both 
ways: for a particular pair of bases, any linear map is represented by a matrix 
and any matrix describes a linear map. That is, by fixing spaces and bases we 
get a correspondence between maps and matrices. In the rest of this chapter 
we will explore this correspondence. For instance, we've defined for linear maps 
the operations of addition and scalar multiplication and we shall see what the 
corresponding matrix operations are. We shall also see the matrix operation 
that represent the map operation of composition. And, we shall see how to find 
the matrix that represents a map's inverse. 




is nonsingular because this matrix has rank two. 
2.11 Example Any map g: V — > W represented by 




206 



Chapter Three. Maps Between Spaces 



Exercises 

/ 2.12 Let h. be the linear map defined by this matrix on the domain and 
codomain with respect to the given bases. 

"-(4 2) B HI +x,x),D =((;),(;)) 

What is the image under h of the vector v = 2x — 1 ? 
/ 2.13 Decide if each vector lies in the range of the map from to R^ represented 
with respect to the standard bases by the matrix. 

/ 2.14 Consider this matrix, representing a transformation of R^, and these bases for 
that space. 

i) -<(:).c> -<;)•(-;> 

(a) To what vector in the codomain is the first member of B mapped? 

(b) The second member? 

(c) Where is a general vector from the domain (a vector with components x and 
y) mapped? That is, what transformation of R^ is represented with respect to 
B, D by this matrix? 

2.15 What transformation of F = {a cos 6 + bsin9 | a, b e R} is represented with 
respect to B = (cos 9 — sin 6, sin 9) and D = (cos 6 + sin 9, cos 9) by this matrix? 

^0 o^ 

,1 0; 

/ 2.16 Decide whether 1 + 2x is in the range of the map from R^ to represented 
with respect to £3 and (1,1 +x^,x) by this matrix. 

(i:;) 

2.17 Example 2.11 gives a matrix that is nonsingular and is therefore associated 
with maps that are nonsingular. 

(a) Find the set of column vectors representing the members of the null space of 
any map represented by this matrix. 

(b) Find the nullity of any such map. 

(c) Find the set of column vectors representing the members of the range space 
of any map represented by this matrix. 

(d) Find the rank of any such map. 

(e) Check that rank plus nullity equals the dimension of the domain. 

2.18 This is an alternative proof of Lemma 2.9. Given an nxn matrix H, fix a 
domain V and codomain W of appropriate dimension n, and bases B, D for those 
spaces, and consider the map h represented by the matrix. 

(a) Show that h is onto if and only if there is at least one Repg (v) associated by 
H with each RepQ{w). 

(b) Show that h is one-to-one if and only if there is at most one Repg (v) associated 
by H with each Repp (w) . 

(c) Consider the linear system H-Repg (v) = Rep^ (w). Show that H is nonsingular 
if and only if there is exactly one solution Repg (v) for each Repp (w) . 
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/ 2.19 Because the rank of a matrix equals the rank of any map it represents, if 
one matrix represents two different maps H — Repg ^ (h) = Repg p (h.) (where 
h, h: V ^> W) then the dimension of the range space of h equals the dimension of 
the range space of h,. Must these equal-dimensioned range spaces actually be the 
same? 

/ 2.20 Let V be an n-dimensional space with bases B and D. Consider a map that 
sends, for v e V, the column vector representing v with respect to B to the column 
vector representing v with respect to D. Show that map is a linear transformation 
of 

2.21 Example 2.3 shows that changing the pair of bases can change the map that 
a matrix represents, even though the domain and codomain remain the same. 
Could the map ever not change? Is there a matrix H, vector spaces V and W, 
and associated pairs of bases Bi,Di and B2,D2 (with B] B2 or Di 7^ D2 or 
both) such that the map represented by H with respect to Bi,Di equals the map 
represented by H with respect to B2, D2? 

/ 2.22 A square matrix is a diagonal matrix if it is all zeroes except possibly for the 
entries on its upper-left to lower-right diagonal — its 1, 1 entry, its 2,2 entry, etc. 
Show that a linear map is an isomorphism if there are bases such that, with respect 
to those bases, the map is represented by a diagonal matrix with no zeroes on the 
diagonal. 

2.23 Describe geometrically the action on of the map represented with respect 
to the standard bases £2, £2 by this matrix. 

/3 o^ 

[0 1) 

Do the same for these. 

1 0\ /O A /I 3 

oj \} 0) \0 1 

2.24 The fact that for any linear map the rank plus the nullity equals the dimension 
of the domain shows that a necessary condition for the existence of a homomorphism 
between two spaces, onto the second space, is that there be no gain in dimension. 
That is, where h: V ^> W is onto, the dimension of W must be less than or equal 
to the dimension of V. 

(a) Show that this (strong) converse holds: no gain in dimension implies that 
there is a homomorphism and, further, any matrix with the correct size and 
correct rank represents such a map. 

(b) Are there bases for such that this matrix 

/I OX 

H = 2 

\o 1 0/ 

represents a map from to whose range is the xy plane subspace of R-'? 

2.25 Let V be an n-dimensional space and suppose that x e W^. Fix a basis 
B for V and consider the map : V ^> R given v n> x • Repg (v) by the dot 
product. 

(a) Show that this map is linear. 

(b) Show that for any linear map g: V — > R there is an x e R" such that g — h^. 

(c) In the prior item we fixed the basis and varied the x to get all possible linear 
maps. Can we get all possible linear maps by fixing an x and varying the basis? 
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2.26 Let V, W, X be vector spaces with bases B, C,D. 

(a) Suppose that h: V ^> W is represented with respect to B, C by the matrix H. 
Give the matrix representing the scalar multiple rh (where r e R) with respect 
to B, C by expressing it in terms of H. 

(b) Suppose that h, g: V — > W are represented with respect to B, C by H and G. 
Give the matrix representing h + g with respect to B, C by expressing it in terms 
of H and G . 

(c) Suppose that h: V — > W is represented with respect to B, C by H and g : W — > X 
is represented with respect to C, D by G. Give the matrix representing g o h 
with respect to B, D by expressing it in terms of H and G. 



Section IV. Matrix Operations 



209 



IV Matrix Operations 

The prior section shows how matrices represent linear maps. When we see a new 
idea, a good strategy is to explore how it interacts with things that we already 
understand. In the first subsection below we will see how the representation 
of a scalar product r • f relates to the representation of f , and also how the 
representation of the sum of two maps f + g relates to the representations of f 
and g. In the later subsections we will explore the representation of linear map 
composition and inverse. 



IV.l Sums and Scalar Products 

We start with an example showing the relationship between the representation 
of a function and the representation of a scalar multiple of that function. 

1.1 Example Let f : V W be a linear function represented with respect to some 
bases by this matrix. 

F!.ePB,D(f) = 

Consider the scalar multiple map 5f : V — )• W. We want to see how to compute 
RepB,D(5f) from RepB^off)- 

The difference between the functions is that if f takes v ^ w then 5f takes 

V H^- 5w. So consider the representations of the domain and codomain vectors 
that are associated by f . 

RepB(v) = RepD(w) = 

Where the basis D is (5i , 62) the representation above says that w = Wi 61 +W262. 
Since 5w = 5 • (wi 61 + W262) = (5wi )6i + (5w2)62 we have that 5f associates 

V with the vector having this representation. 

Changing the map from f to 5f has the effect of changing the representation of 
the codomain vector by multiplying its entries by 5. 
So Repg i3(5f) is this matrix. 

Hep,o(3f)^(:;)^(3„,^;g Eep,,,5f,.g ») 

Therefore going from the matrix representing f to the matrix representing 5f 
just means multiplying all the entries by 5. 
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Consider also how to compute the representation of the sum of two maps. 

1.2 Example Suppose that two linear maps with the same domain and codomain 
f , g : ^ are represented with respect to some bases B and D by these 
matrices. 

RePB,D(f) = ^2 o) I^epB,D(g) = 

Recall the definition of the sum of two functions: if f does v i— > u and g does 
V 1-^ w then f + g is the function v u + w. If these are the representations of 
the vectors 

RepB(v) = I M RepD(u) = | M Repo^w) 

where D = (5i , 82) then we have u + w = (ui 61 + U262) + (wi 5i + W262] — 
(ui + wi )5i + (u2 +W2)52 and so this is the representation of the vector sum. 




RepD(u + w) = 
Since these represent the actions of f and g 






this represents the action of f + g . 

R.epB,D(f +g) 




— V] +2V2 

4vi +4v2 



Therefore, we compute the matrix representing the function sum by adding the 
entries of the two matrices representing the functions. 



RepB,D(f + g) 



-1 2' 
4 4, 



1.3 Definition The scalar multiple of a matrix is the result of entry-by-entry 
scalar multiplication. The sum of two same-sized matrices is their entry-by-entry 



These operations extend the first chapter's addition and scalar multiplication 
operations on vectors. 

1.4 Theorem Let h, g: V — > W be linear maps represented with respect to bases 
B,D by the matrices H and G and let r be a scalar. Then with respect to 
B, D the map r • h: V W is represented by rH and the map h, + g : V ^ W is 
represented by H + G . 
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Proof This is Exercise 9. Generalize the examples above. QED 

1.5 Remark These two operations are simple. However we did not define them 
this way because they are simple. We defined them this way because they reflect 
the operations of function scalar multiplication and function addition. Their 
simplicity is a pleasant bonus. 

In the next subsection we will define another operation, matrix multiplication. 
Based on the above two operations a person's first thought may be to take an 
entry-by-entry product of two matrices. While in theory we could define whatever 
matrix operations we like, our program here is to instead be practical and define 
the new operation so that it combines the entries in the way that represents 
function composition. That is, we are defining matrix operations by referencing 
function operations. 

We can express this point another way. Recall Theorem III. 1.4, which says 
that matrix- vector multiplication represents the application of a linear map. 
Following it. Remark III. 1.6 notes that the theorem justifies the definition of 
matrix-vector multiplication, and so in some sense the theorem must hold. If 
the definition was such that the theorem didn't hold then we would adjust the 
definition until it did hold. The above Theorem 1.4 is another example of such 
a result. 

A special case of scalar multiplication is multiplication by zero. For any map 
• h is the zero homomorphism and for any matrix • H is the matrix with all 
entries zero. 

1.6 Definition A zcTO THCLtvix lias 3.11 Giitries 0. We write Zn.xm 

or simply Z 

(another common notation is Onxm or just 0). 

1.7 Example The zero map from any three-dimensional space to any two- 
dimensional space is represented by the 2x3 zero matrix 



0^ 
0, 



no matter what domain and codomain bases we use. 
Exercises 

/ 1.8 Perform the indicated operations, if defined. 
^5 -1 2\ [2 1 4^ 



(a) 
(b) 6 

(c) 



6 1 ly V3 5 
2 -1 -1 
1 2 3 
2 A fl 1 

sJ^VO 3 



3 -V V-2 1 

2 A n 1 4 

3 oj^ U 5 
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1.9 Prove Theorem 1.4. 

(a) Prove that matrix addition represents addition of linear maps. 

(b) Prove that matrix scalar multiplication represents scalar multiplication of 
linear maps. 

/ 1.10 Prove each, assuming that the operations are defined, where G, H, and J are 
matrices, where Z is the zero matrix, and where r and s are scalars. 

(a) Matrix addition is commutative G + H = H + G. 

(b) Matrix addition is associative G + (H + J] = (G + H) + J. 

(c) The zero matrix is an additive identity G + Z = G. 

(d) • G = Z 

(e) {r + s)G =rG + sG 

(f) Matrices have an additive inverse G + (—1 ) ■ G = Z. 

(g) r(G + H) =rG + rH 

(h) (rs)G =r(sG) 

1.11 Fix domain and codomain spaces. In general, one matrix can represent many 
different maps with respect to different bases. However, prove that a zero matrix 
represents only a zero map. Are there other such matrices? 

/ 1.12 Let V and W be vector spaces of dimensions n and ra. Show that the space 
£(V, W) of linear maps from V to W is isomorphic to M,^xn- 

/ 1.13 Show that it follows from the prior questions that for any six transformations 
t] , . . . , tg : — 7> there are scalars Ci , . . . , Ce £ R such that Ci ti + ■ ■ ■ + Cetg is 
the zero map. {Hint: the six is slightly misleading.) 

1.14 The trace of a square matrix is the sum of the entries on the main diagonal 
(the 1, 1 entry plus the 2,2 entry, etc.; we will see the significance of the trace in 
Chapter Five). Show that trace(H+ G) — trace(H) +trace(G]. Is there a similar 
result for scalar multiplication? 

1.15 Recall that the transpose of a matrix M is another matrix, whose i,j entry is 
the j,i entry of M. Verify these identities. 

(a) (G + H)'^ = GT + HT 

(b) (r • H)"^ = r • 

/ 1.16 A square matrix is symmetric if each i,j entry equals the j,i entry, that is, if 
the matrix equals its transpose. 

(a) Prove that for any square H, the matrix H + is symmetric. Does every 
symmetric matrix have this form? 

(b) Prove that the set of n X TL symmetric matrices is a subspace of JVTnxn • 

/ 1.17 (a) How does matrix rank interact with scalar multiplication — can a scalar 
product of a rank n matrix have rank less than n? Greater? 
(b) How does matrix rank interact with matrix addition — can a sum of rank n 
matrices have rank less than n? Greater? 



IV.2 Matrix Multiplication 

After representing addition and scalar multiplication of linear maps in the prior 
subsection, the natural next map operation to consider is composition. 
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2.1 Lemma The composition of linear maps is linear. 



Proof (This argument has appeared earlier, as part of the proof of Theo- 
rem 1.2.2.) Let h: V ^ W and g: W ^ U be linear. The calculation 

g o h.(ci • vi +02 • V2) = g(H(ci -vi + C2 • V2) ) = g(ci • h.(vi ) + C2 • 1t.(v2] ) 
= ci • g(h.(vi)) +C2 • g(Mv2)) = ci • (goh)(vi) +C2 • (goh.)(v2) 

shows that g o h: V — > U preserves linear combinations. QED 

To see how the representation of the composite relates to the representations 
of the compositors, consider an example. 

2.2 Example Let h: ^ and g: ^ M^, fix bases B c 
D C M^, and let these be the representations. 



C C 



H = RepB,c(H) = 



'4 6 8 
,5 7 9 




G = Repc,D(g) 




C,D 



To represent the composition g o h: 



we start with a v, represent h of 



V, and then represent g of that. The representation of h.(v] is the product of h's 
matrix and v's vector. 

V2 
V3 

Vv4; 



Repc(K(v]) = 





'4vi + 6v2 + 8V3 +2v4 
. 5vi + 7V2 + 9V3 + 3V4 



The representation of g( h(v) ) is the product of g's matrix and h.(v)'s vector. 



Repol g(h,(v)) ; 




/ 4vi + 6v2 + 8v3 
\5v^ + 7v2 + 9v3 

C,D 

6V2 + 8V3 + 2V4 
8V3 + 2V4 
8V3 + 2V4 




1 • (5vi +7v2 +9v3 +3V4) 
1 • (5vi +7v2 +9v3 +3V4) 
• (5vi + 7v2 + 9v3 + 3v4) 



/ 1 • (4vi 
= • (4vi + 6v2 
yi • (4vi + 6v2 

Distributing and regrouping on the v's gives 

/(I -4+1 -Sjvi +(1-6+1 •7)V2 + (1 • 
= (0-4+1 -Sjvi +(0-6+1 •7)v2 + (0-, 
^(1 -4 + 0-5)vi +(1 -6 + 0-7)v2 + (1 

which we recognize as the result of this matrix- vector product 




+ 1 -9)V3 + {1 - 2 + 1 -3)V4' 
+ 1 -9)V3 + (0-2 + 1 -3)V4 
+ 0-9)v3 + (1 -2 + 0-3]v4, 



V2 
V3 
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Thus the matrix representing g o h, has the rows of G combined with the columns 
of H. 

2.3 Definition The matrix-multiplicative product of the mxr matrix G and the 
rxn matrix H is the mxn matrix P, where 



Pi,i = 91,1^.1 J + gi,2H2,j 



that is, the i, j-th entry of the product is the dot product of the i-th row of the 
first matrix with the j-th column of the second. 



GH 



gt.i 9i,2 



V 



Hi J 



'■TO 



2.4 Example The matrices from Example 2.2 combine in this way. 



4+1 


5 


1 


6 + 1 


7 


1 • 8 


+ 1-9 


1 


2 + 1 




(9 13 


17 


4+1 


5 





6 + 1 


7 


0-8 


+ 1-9 





2 + 1 




5 7 


9 


4 + 


5 


1 


6 + 


7 


1 • 8 


+ 0-9 


1 


2 + 




U 6 


8 



2.5 Example 




2-1+0 


5 


2-3- 


hO 




( ^ 


6 


4-1+6 


5 


4-3- 


h6 




34 


54 


8-1+2 


5 


8-3- 


h2 




^18 


38 



We next check that our definition of the matrix-matrix multiplication opera- 
tion does what we intend. 

2.6 Theorem A composition of linear maps is represented by the matrix product 
of the representatives. 

Proof This argument generalizes Example 2.2. Let h: V ^ W and g: W ^ X 
be represented by H and G with respect to bases B c V, C c W, and D c X, of 
sizes n, r, and ra. For any v e V, the k-th component of Rep,- ( h.(v) ) is 

HkjVi H h h.k,nVn 

and so the i-th component of Rep^ ( g o h (v) ) is this. 

gi,l ■ (hijVi H hhi^nVn) + 91,2 ' ['^2,^V^ H hh.2,nVn) 

H h gi,r • (Hr.lVi H h h.r,nVn] 

Distribute and regroup on the v's. 



= (gi,llT-l,i + gi,2H2,l H h gi,rHr,l) - V] 

H + (gi J IT-1 + gt,2lT-2,n H + gv,rlT-r,n] ' Vn 
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Finish by recognizing that the coefficient of each Vj 

91,1 Hi ,j + gi,2H2J + • • • + gi.rHrJ 

matches the definition of the i, j entry of the product GH. QED 

This arrow diagram pictures the relationship between maps and matrices 
('wrt' abbreviates 'with respect to'). 

Wu;rt C 




Vyirt B ^ X^rt D 



Above the arrows, the maps show that the two ways of going from V to X, 
straight over via the composition or else in two steps by way of W, have the 
same effect 

(this is just the definition of composition). Below the arrows, the matrices 
indicate that the product does the same thing — multiplying GH into the column 
vector RepB (v) has the same effect as multiplying the column vector first by H 
and then multiplying the result by G. 

R-epB,D(g°TT-) = GH Repc,D(g)RepB,c(h-) = GH 

2.7 Example Because the number of columns on the left does not equal the 
number of rows on the right, this product is not defined. 




One way to understand why the combination in the prior example is undefined 
has to do with the underlying maps. We require that the sizes match because 
we want that the underlying function composition is possible. 

dimension n space dimension r space — ^ dimension m space 

Thus, matrix product combines an mxr matrix G with an rxn matrix F to 
yield the mxn result GF. Briefly, 'mxr times rxn equals mxn'. 

2.8 Remark The order in which these things are written can be confusing. In 
the prior equation, the niunber written first m is the dimension of g's codomain 
and is thus the number that appears last in the map dimension description 
above. The explanation is that while h is done first and then g, we write the 
composition as g o h, from the notation 'g(h.(v])'. (Some people try to lessen 
confusion by reading 'g o h' aloud as "g following H.") That right to left order 
carries over to matrices: g o h is represented by GH. 
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We can get insight into matrix-matrix product operation by studying how 
the entries combine. For instance, an alternative way to understand why we 
require above that the sizes match is that the row of the left-hand matrix must 
have the same number of entries as the column of the right-hand matrix, or else 
some entry will be left without a matching entry from the other matrix. 

Another aspect of the combinatorics of matrix multiplication is that in the 
definition of the i, j entry 

Vi,j = 91, 1 It- 1 ,j + 9i, 2^ 2,j + • •• + 91, rh r,j 

the highlighted subscripts on the g's are column indices while those on the h's 
indicate rows. That is, the summation takes place over the columns of G but 
over the rows of H — the definition treats left differently than right. So we may 
reasonably suspect that GH can be unequal to HG. 

2.9 Example Matrix multiplication is not commutative. 







2.10 Example Commutativity can fail more dramatically: 



23 34 0^ 
,31 46 0, 



while 



isn't even defined. 

2.11 Remark The fact that matrix multiplication is not commutative can be 
puzzling at first, perhaps because most operations that people see in prior 
mathematics courses are commutative. But matrix multiplication represents 
function composition, which is not commutative: if f(x) = 2x and g(x) = x + 1 
then g o f (x) = 2x -|- 1 while f o g(x) — 2(x + ]) — 2x + 2. (True, this g is not 
linear and we might have hoped that linear functions would commute but this 
shows that the failure of commutativity for matrix multiplication fits into a 
larger context.) 

Except for the lack of commutativity, matrix multiplication is algebraically 
well-behaved. Below are some nice properties and more are in Exercise 24 and 
Exercise 25. 



2.12 Theorem If F, G, and H are matrices, and the matrix products are defined, 
then the product is associative (FG)H = F(GH) and distributes over matrix 
addition F(G + H] = FG + FH and (G + H]F = GF + HF. 
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Proof Associativity holds because matrix multiplication represents function 
composition, which is associative: the maps (f o g) o h and f o (g o h.) are equal 
as both send v to f(g(h,(v))). 

Distributivity is similar. For instance, the first one goes f o (g + h.) (v] = 
f((g + h)(v)) =f(g(v)+h{v)) =f(g(v))+f(h(v)) =fog(v)+foh(v) (the 
third equality uses the linearity of f). QED 

2.13 Remark We could instead prove that result by slogging through the indices. 
For example, for associativity the i, j-th entry of (FG)H is 

(fi,igi,l +fi, 292,1 H h fi,rgr,l 

+ (fi,igi,2 + fi,2g2,2 H hfi,rgT,2)H2,j 

+ (fi,igi,s +fi,2g2,s H h fi,rgr,s)Hs,j 

(where F, G, and H are mxr, rxs, and sxn matrices), distribute 

fi,igi,llT.l,j + fi,2g2,llT-i,j H h fi,rgr,lhl,j 

+ fi,l gi ,2lT.2,j + fi,2g2,2h.2,j H h fi,TgT,2lT.2,j 

+ fi,igi,sh-s,j + fi,2g2,slT-s,j H ^ fi,rgT,s^s,j 

and regroup around the f s 

fi,i (gi,ih-i,j + gi,2lT-2,j H H gi,shs,j) 

+ fi,2(g2,lh.i,j + g2,2lT-2,j H h g2,slT-s,i) 

+ fi,T(gr,lhi,j + gT,2lT.2,j H h gr,shs,j] 

to get the i, j entry of F(GH). 

Contrast the two ways of verifying cissociativity. The argument just above 
is hard to understand in that while the calculations are easy to check, the 
arithmetic seems unconnected to any idea. The argument in the proof is shorter 
and says why this property "really" holds. This illustrates the comments made at 
the start of the chapter on vector spaces — at least some of the time an argument 
from higher-level constructs is clearer. 

We have now seen how to construct the representation of the composition of 
two linear maps from the representations of the two maps. We have called the 
combination the product of the two matrices. We will explore this operation 
more in the next subsection. 

Exercises 

/ 2.14 Compute, or state "not defined". 
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A=(! -I] B=r^ '^ c=^-' ' 




yl OJ \4 4J V-4 1 

compute or state 'not defined'. 

(a) AB (b) (AB)C (c) BC (d) A(BC) 
2.16 Which products are defined? 

(a) 3x2 times 2x3 (b) 2 x 3 times 3x2 (c) 2 x 2 times 3x3 
(d) 3x3 times 2x2 
/ 2.17 Give the size of the product or state "not defined". 

(a) a 2 X 3 matrix times a 3 x 1 matrix 

(b) a 1 X 12 matrix times a 12x 1 matrix 

(c) a 2 X 3 matrix times a 2 x 1 matrix 

(d) a 2 X 2 matrix times a 2 x 2 matrix 

/ 2.18 Find the system of equations resulting from starting with 

h,jX, +hi_2X2 +h,_3X3 = d, 
h2,lX, + h2,2X2 + h2,3X3 = d2 

and making this change of variable (i.e., substitution). 

"1 = gi,iyi + gi,2y2 
X2 = g2,iyi + g2,2y2 
>'3 = gs.iyi + g3,2y2 

2.19 As Definition 2.3 points out, the matrix product operation generalizes the dot 
product. Is the dot product of a 1 xn row vector and a nx 1 column vector the 
same as their matrix-multiplicative product? 
/ 2.20 Represent the derivative map on 7^ with respect to B, B where B is the natural 
basis (1 , X, . . . , x'^). Show that the product of this matrix with itself is defined; 
what map does it represent? 

2.21 [Cleary] Match each type of matrix with all these descriptions that could fit: 
(i) can be multiplied by its transpose to make a 1 x 1 matrix, (ii) is similar to the 
3x3 matrix of all zeros, (iii) can represent a linear map from to that is not 
onto, (iv) can represent an isomorphism from to J"^. 

(a) a 2 X 3 matrix whose rank is 1 

(b) a 3 X 3 matrix that is nonsingular 

(c) a 2 X 2 matrix that is singular 

(d) an n X 1 column vector 

2.22 Show that composition of linear transformations on R^ is commutative. Is this 
true for any one-dimensional space? 

2.23 Why is matrix multiplication not defined as entry-wise multiplication? That 
would be easier, and commutative too. 

/ 2.24 (a) Prove that HPRi = HP+i and (HP]i = H^i for positive integers p, q. 

(b) Prove that {rH)P = • HP for any positive integer p and scalar r e R. 
/ 2.25 (a) How does matrix multiplication interact with scalar multiplication: is 
r(GH) = {rG)H? Is G(rH) = r(GH)? 
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(b) How does matrix multiplication interact with linear combinations: is F{rG + 
sH) = r(FG) + s(FH)? Is (rP + sG)H = rFH + sGH? 
2.26 We can ask how the matrix product operation interacts with the transpose 
operation. 

(a) Show that (GH)"^ = H^G'^. 

(b) A square matrix is symmetric if each i, j entry equals the j,i entry, that is, if 
the matrix equals its own transpose. Show that the matrices HH^ and H^H are 
symmetric. 

/ 2.27 Rotation of vectors in R-' about an axis is a linear map. Show that linear maps 
do not commute by showing geometrically that rotations do not commute. 

2.28 In the proof of Theorem 2.12 we used some maps. What are the domains and 
codomains? 

2.29 How does matrix rank interact with matrix multiplication? 

(a) Can the product of rank n matrices have rank less than n? Greater? 

(b) Show that the rank of the product of two matrices is less than or equal to the 
minimum of the rank of each factor. 

2.30 Is 'commutes with' an equivalence relation among nxn matrices? 

/ 2.31 (We will use this exercise in the Matrix Inverses exercises.) Here is another 
property of matrix multiplication that might be puzzling at first sight. 

(a) Prove that the composition of the projections 7tx,7ty : — > R'' onto the x and 
y axes is the zero map despite that neither one is itself the zero map. 

(b) Prove that the composition of the derivatives d^/dx^, d^/dx^ : ^4 — !> 3'4 is the 
zero map despite that neither is the zero map. 

(c) Give a matrix equation representing the first fact. 

(d) Give a matrix equation representing the second. 

When two things multiply to give zero despite that neither is zero we say that each 
is a zero divisor. 

2.32 Show that, for square matrices, (S + T)(S — T) need not equal — T^. 
/ 2.33 Represent the identity transformation id: V ^> V with respect to B, B for any 
basis B. This is the identity matrix I. Show that this matrix plays the role in matrix 
multiplication that the number 1 plays in real number multiplication: HI = IH = H 
(for all matrices H for which the product is defined). 

2.34 In real number algebra, quadratic equations have at most two solutions. That 
is not so with matrix algebra. Show that the 2x2 matrix equation = I has more 
than two solutions, where I is the identity matrix (this matrix has ones in its 1,1 
and 2,2 entries and zeroes elsewhere; see Exercise 33). 

2.35 (a) Prove that for any 2x2 matrix T there are scalars Co, . . . , C4 that are not 
all such that the combination C4T^ + 03!^ + CzT'^ + Ci T + CqI is the zero matrix 
(where I is the 2x2 identity matrix, with 1 's in its 1,1 and 2, 2 entries and zeroes 
elsewhere; see Exercise 33). 

(b) Let p(x] be a polynomial p(x) = Cnx'^ + • • • + Cjx + Cq. If T is a square 
matrix we define p(T) to be the matrix c^J^ + ■ • • + C]T + I (where I is the 
appropriately-sized identity matrix). Prove that for any square matrix there is a 
polynomial such that p(T) is the zero matrix. 

(c) The minimal polynomial m(x) of a square matrix is the polynomial of least 
degree, and with leading coefficient 1 , such that m{T) is the zero matrix. Find 
the minimal polynomial of this matrix. 

-1/2\ 
1/2 VS/l) 
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(This is the representation with respect to £2; £2, the standard basis, of a rotation 
through 71/6 radians counterclockwise.) 

2.36 The infinite-dimensional space 7 of all finite-degree polynomials gives a memo- 
rable example of the non-commutativity of linear maps. Let d/dx be the 
usual derivative and let s : J — > T be the shift map. 

Qo + aixH h anx'^ O + aox + uix^H h anx"+^ 

Show that the two maps don't commute d/dx o s 7^ s o d/dx; in fact, not only is 
(d/dx o s) — (s o d/dx) not the zero map, it is the identity map. 

2.37 Recall the notation for the sum of the sequence of numbers qi , ai, . . . , an. 

n 

Ui = Q, + Q2 H h Qn 

1=1 

In this notation, the i, j entry of the product of G and H is this. 

r 

Pi,j = Y. 9i,k^k,j 

k=1 

Using this notation, 

(a) reprove that matrix multiplication is associative; 

(b) reprove Theorem 2.6. 



IV.3 Mechanics of Matrix Multiplication 

In this subsection we consider matrix multiplication as a mechanical process, 
putting aside for the moment any implications about the underlying maps. 

The striking thing about matrix multiplication is the way rows and columns 
combine. The 1, j entry of the matrix product is the dot product of row i of the 
left matrix with column j of the right one. For instance, here a second row and 
a third column combine to make a 2, 3 entry. 



We can view this as the left matrix acting by multiplying its rows, one at a time, 
into the columns of the right matrix. Or, another perspective is that the right 
matrix uses its columns to act on the left matrix's rows. Below, we will examine 
actions from the left and from the right for some simple matrices. 
The action of a zero matrix is easy. 

3.1 Example Multiplying by an appropriately-sized zero matrix from the left or 
from the right results in a zero matrix. 




(9 


13 


17 5 


5 


7 


9 3 




6 


8 2 






The next easiest to understand matrices, after the zero matrices, are the ones 
with a single nonzero entry. 
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3.2 Definition A matrix with all O's except for a 1 in the i, j entry is an i, j unit 
matrix. 



3.3 Example This is the 1,2 unit matrix with three rows and two columns, 
multiplying from the left. 





Acting from the left, an i, j unit matrix copies row j of the multiplicand into 
row i of the result. Prom the right an 1,} unit matrix picks out column 1 of the 
multiplicand and copies it into column j of the result. 




3.4 Example Rescaling these matrices simply rescales the result. This is the 
action from the left of the matrix that is twice the one in the prior example. 




And this is the action of the matrix that is —3 times the one from the prior 
example. 

-3^ 



/I 

4 

V 



(o 






Next in complication are matrices with two nonzero entries. There are two 
cases. If a left-multiplier has entries in different rows then their actions don't 
interact. 



3.5 Example 
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But if the left-multiplier's nonzero entries are in the same row then that row of 
the result is a combination. 



3.6 Example 



1 2\ n 2 3 

4 5 6 
>0 0/ \7 8 9 




3\ 
6 

V 



Right-multiplication acts in the same way, but with columns. 

These observations about simple matrices extend to arbitrary ones. 

3.7 Example Consider the columns of the product of two 2x2 matrices. 



gijhij + gi,2lT.2,l gi,llT,i,2 + 91,2^2,2 
,92,1 hi J + 92,2h2,l 92,1 ,2 + 92,2^.2,2 , 



Each column is the result of multiplying G by the corresponding column of H. 




Hi,i 

>2,1 



9i,i1t-i,i 

,g2,iiT-i,i 



9l,2lT-2,l 
92,2lT.2,l 




9i,i1t-i,2 
92,i1t-i,2 



9l,2lT-2,2 
92,2lT-2,2 , 



3.8 Lemma In a product of two matrices G and H, the columns of GH are formed 
by taking G times the columns of H 





/ : 




■:] 




( : 




: \ 


G • 


Hi 








G ■ hi 




G-hn 




V : 




: ) 




I : 




■ J 



and the rows of GH are formed by taking the rows of G times H 





• 91 • 






/...gi.H...X 




■ H = 




V- 


■ gr ■ 


■■) 









(ignoring the extra parentheses). 



Proof We will check that in a product of 2x2 matrices, the rows of the product 
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equal the product of the rows of G with the entire matrix H. 

91,1 gi,2)H\ 
92,1 g2,2)Hy 

^ / (91,1^-1,1 + 91,2^2,1 gi,lHi,2 + gi,2lT-2,2]\ 
1^(92,1^1,1 +92,2^2,1 92,1^1,2 + 92,2^2,2 )y 

We will leave the more general check as an exercise. QED 

An application of those observations is that there is a matrix that just copies 
out the rows and columns. 

3.9 Definition The main diagonal (or principle diagonal or diagonal) of a 
square matrix goes from the upper left to the lower right. 



3.10 Definition An identity matrix is square and every entry is except for Ts 
in the main diagonal. 

/l ... 0\ 
^ 1 ... 

[o ... ^J 

3.11 Example Here is the 2x2 identity matrix leaving its multiplicand unchanged 
when it acts from the right. 



/I 


-2\ 





-2 


1 


-1 


^4 






/I 


~2\ 





-2 


1 


-1 


^4 


3/ 



3.12 Example Here the 3x3 identity leaves its multiplicand unchanged both from 
the left 



and from the right. 












3 





1 




(i 


3 











1 






In short, an identity matrix is the identity element of the set of nxn matrices 
with respect to the operation of matrix multiplication. 

We next see two ways to generalize the identity matrix. The first is that if 
we relax the ones to arbitrary reals then the resulting matrix will rescale whole 
rows or columns. 
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3.13 Definition A diagonal matrix is square and has O's off the main diagonal. 

/ai,i ... \ 
Q2,2 .•• 



V 







3.14 Example Prom the left, the action of multiplication by a diagonal matrix is 
to rescales the rows. 



From the right such a matrix rescales the columns. 





The second generalization of identity matrices is that we can put a single one 
in each row and column in ways other than putting them down the diagonal. 

3.15 Definition A permutation matrix is square and is all O's except for a single 1 
in each row and column. 



3.16 Example Prom the left these matrices permute rows. 




1 2 3\ 


(7 8 


9 


4 5 6 = 


1 2 




7 8 9] 


^4 5 





Prom the right they permute columns. 












(2 


3 













6 


4 





1 






9 





We finish this subsection by applying these observations to get matrices that 
perform Gauss's Method and Gauss- Jordan reduction. 

3.17 Example We have seen how to produce a matrix that will rescale rows. 
Multiplying by this diagonal matrix rescales the second row of the other matrix 
by a factor of three. 



/l o\ 
3 
1 











2 


1 







1 


3 


-3 







2 
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We have seen how to produce a matrix that will swap rows. Multiplying by this 
permutation matrix swaps the first and third rows. 



(o 










1 


ill 


V 











2 


1 









2 








1 


3 


f 





1 


3 









2 




^0 


2 


1 





To see how to perform a row combination, we observe something about those 
two examples. The matrix that rescales the second row by a factor of three 
arises in this way from the identity. 





















1 


? 







3 













^0 








Similarly, the matrix that swaps first and third rows arises in this way. 



/I 







(0 










1 


oj 





1 


i) 

















3.18 Example The 3x3 matrix that arises as 





















1 




-2p2+P3 





1 













10 


-2 





will, when it acts from the left, perform the combination operation — 2p2 



P3- 







1 
-2 1 



/I 





2 









2 








1 


3 


f 





1 


3 




Vo 


2 


1 









-5 





3.19 Definition The elementary reduction matrices result from applying a one 
Gaussian operation to an identity matrix. 

(1) I ^ Miik) for k 7^0 

(2) I^^^' Ptjfori^j 



(3) I'^'-i^'^' Ci,j(k) fori^j 



3.20 Lemma Matrix multiplication can do Gaussian reduction. 
(1) If H ^ G then Mi(k)H = G. 



(2) If H ^ii^' G then P^jH = G. 



(3) If H G then Ci,j(k)H = G. 
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Proof Clear. QED 

3.21 Example This is the first system, from the first chapter, on which we 
performed Gauss's Method. 

3x3=9 
xi + 5x2 — 2x3 — 2 
(l/3)xi+2x2 =3 

We can reduce it with matrix multiplication. Swap the first and third rows. 



(o 










1 




V 














3 




^1/3 


2 





3 




5 


-2 


I 




5 


-2 




1/3 


2 







^ 





3 





triple the first row. 












1 


( 













(^ 


6 





9 





-1 


-2 









3 





and then add —1 times the first row to the second 

/ 1 

-1 

V 

Now back substitution will give the solution. 

3.22 Example Gauss- Jordan reduction works the same way. For the matrix ending 
the prior example, first adjust the leading entries 

/l o\ /l 



,0 








6 







(^ 


6 







1 


-2 


f 





1 


2 







3 









1 





and to finish, clear the third column and then the second column. 
/l 








3.23 Corollary For any matrix H there are elementary reduction matrices Ri , 
Rr such that Rr • Rr-i • • • Ri • H is in reduced echelon form. 



Until now we have taken the point of view that our primary objects of study 
are vector spaces and the maps between them, and have adopted matrices only 
for computational convenience. This subsection show that this isn't the whole 
story. Understanding matrices operations by how the entries combine can be 
useful also. In the rest of this book we shall continue to focus on maps as the 
primary objects but we will be pragmatic — if the matrix point of view gives 
some clearer idea then we will go with it. 
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Exercises 

/ 3.24 Predict the result of each multiphcation by an elementary reduction matrix, 
and then check by multiplying it out. 



(d) 



1 2 

3 4 
1 -1 
1 



(b) 

(e) 



1 

2 
1 1 
3 4 



1 2 
3 4 

1 

1 



(c) 



1 

-2 



3.25 Predict the result of each multiplication by a diagonal matrix, and then check 
by multiplying it out. 



(a) 



(b) 



/ 3.26 This table gives the number of hours of each type done by each worker, and 
the associated pay rates. Use matrices to compute the wages due. 





regular 


overtime 




wage 


Alan 


40 


12 


regular 


$25.00 


Betty 


35 


6 


overtime 


$45.00 


Catherine 


40 


18 






Donald 


28 










Remark. This illustrates that in practice we often want to compute linear combi- 
nations of rows and columns in a context where we really aren't interested in any 
associated linear maps. 
/ 3.27 The need to take linear combinations of rows and columns in tables of numbers 
arises often in practice. For instance, this is a map of part of Vermont and New 
York. 



In part because of Lake Champlain, 
there are no roads directly connect- 
ing some pairs of towns. For in- 
stance, there is no way to go from 
Winooski to Grand Isle without go- 
ing through Colchester. (To sim- 
plify the graph many other roads 
and towns have been omitted. From 
top to bottom of this map is about 
forty miles.) 




Swanton 



Colchester 



Winooski 



Burlington 

(a) The adjacency matrix of a map is the square matrix whose \, j entry is the 
number of roads from city i to city j . Produce the incidence matrix of this map 
(take the cities in alphabetical order). 

(b) A matrix is symmetric if it equals its transpose. Show that an adjacency 
matrix is symmetric. (These are all two-way streets. Vermont doesn't have many 
one-way streets.) 

(c) What is the significance of the square of the incidence matrix? The cube? 
3.28 Find the product of this matrix with its transpose. 

^cos 9 — sin 9 
. sin 9 cos 9 
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/ 3.29 Prove that the diagonal matrices form a subspace of M^xn- What is its 
dimension? 

3.30 Does the identity matrix represent the identity map if the bases are unequal? 

3.31 Show that every multiple of the identity commutes with every square matrix. 
Are there other matrices that commute with all square matrices? 

3.32 Prove or disprove: nonsingular matrices commute. 

/ 3.33 Show that the product of a permutation matrix and its transpose is an identity 
matrix. 

3.34 Show that if the first and second rows of G are equal then so are the first and 
second rows of GH. Generalize. 

3.35 Describe the product of two diagonal matrices. 

3.36 Write 



as the product of two elementary reduction matrices. 

/ 3.37 Show that if G has a row of zeros then GH (if defined) has a row of zeros. Does 
that work for columns? 

3.38 Show that the set of unit matrices forms a basis for M,nxm- 

3.39 Find the formula for the n-th power of this matrix. 



/ 3.40 The trace of a square matrix is the sum of the entries on its diagonal (its 
significance appears in Chapter Five). Show that Tr(GH) — Tr(HG). 

/ 3.41 A square matrix is upper triangular if its only nonzero entries lie above, or 
on, the diagonal. Show that the product of two upper triangular matrices is upper 
triangular. Does this hold for lower triangular also? 

3.42 A square matrix is a Markov matrix if each entry is between zero and one and 
the sum along each row is one. Prove that a product of Markov matrices is Markov. 

/ 3.43 Give an example of two matrices of the same rank and size with squares of 
differing rank. 

3.44 Combine the two generalizations of the identity matrix, the one allowing entries 
to be other than ones, and the one allowing the single one in each row and column 
to be off the diagonal. What is the action of this type of matrix? 

3.45 On a computer multiplications have traditionally been more costly than ad- 
ditions, so people have tried to in reduce the number of multiplications used to 
compute a matrix product. 

(a) How many real number multiplications do we need in the formula we gave for 
the product of a m x r matrix and a r x n matrix? 

(b) Matrix multiplication is associative, so all associations yield the same result. 
The cost in number of multiplications, however, varies. Find the association 
requiring the fewest real number multiplications to compute the matrix product 
of a 5 X 1 matrix, a 1 x 20 matrix, a 20 x 5 matrix, and a 5 x 1 matrix. 

(c) (Very hard.) Find a way to multiply two 2x2 matrices using only seven 
multiplications instead of the eight suggested by the naive approach. 

? 3.46 [Putnam, 1990, A-5] If A and B are square matrices of the same size such that 
ABAB = 0, does it follow that BABA = 0? 
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3.47 [Am. Math. Mon., Dec. 1966] Demonstrate these four assertions to get an al- 
ternate proof that column rank equals row rank. 

(a) y • y = iff y = 0. 

(b) Ax = iff A^Ax = 0. 

(c) dim(^(A)) =dim(.^i'(ATA)). 

(d) col rank(A) = col rank(A''") = row rank(A). 

3.48 [Ackerson] Prove (where A is an nxn matrix and so defines a transformation of 
any n-dimensional space V with respect to B, B where B is a basis) that dim(^(A) n 
^(A)) =dim{^(A))-dim(^(A^)). Conclude 

(a) ^(A) C ^(A) iff dim(^(A)) = dim(,^(A)) - dim{.!t(A^)); 

(b) ^[A] C ^(A) iff A^ = 0; 

(c) ^(A) = .yy[A) iff A^ = and dim(^(A)) = dim(^(A)) ; 

(d) dim(^(A) n ^(A)) = iff diml-^lA)) = dim{,^(A2)) ; 

(e) (Requires the Direct Sum subsection, which is optional.) V = ^(A)©^(A) 
iff dim(^(A)] = diml.-f (A^)). 



IV.4 Inverses 

We finish this section by considering how to represent the inverse of a linear 
map. 

We first recall some things about inverses.* Where tt: — )■ is the 
projection map and l: ^ is the embedding 





then the composition tt o l is the identity map tt o i = id on ! 




We say that l is a right inverse of n or, what is the same thing, that tt is a 
left inverse of l. However, composition in the other order i o n doesn't give the 
identity map — here is a vector that is not sent to itself under l o tt. 



In fact, TT has no left inverse at all. For, if f were to be a left inverse of tt then 
we would have 




* More information on function inverses is in the appendix. 



230 



Chapter Three. Maps Between Spaces 



for all of the infinitely many z's. But no function can send a single argument to 
more than one value. (An example of a function with no inverse on either side 
is the zero transformation on M^.) 

Some functions have a two-sided inverse, another function that is the inverse 
of the first both from the left and from the right. For instance, the map given 
by V !—> 2 • V has the two-sided inverse v i-> (1 /2) • v. The appendix shows that a 
function has a two-sided inverse if and only if it is both one-to-one and onto. 
The appendix also shows that if a function f has a two-sided inverse then it is 
unique, and so we call it 'the' inverse and denote it f^^ . 

In addition, recall that we have shown in Theorem II. 2. 21 that if a linear 
map has a two-sided inverse then that inverse is also linear. 

Thus, our goal in this subsection is, where a linear h has an inverse, to find 
the relationship between Rep^ d(H) and Rep^ ^(h.^^). 

4.1 Definition A matrix G is a left inverse matrix of the matrix H if GH is the 
identity matrix. It is a right inverse matrix if HG is the identity. A matrix H 
with a two-sided inverse is an invertible matrix. That two-sided inverse is the 
inverse matrix and is denoted H^^ . 

Because of the correspondence between linear maps and matrices, statements 
about map inverses translate into statements about matrix inverses. 

4.2 Lemma If a matrix has both a left inverse and a right inverse then the two 
are equal. 

4.3 Theorem A matrix is invertible if and only if it is nonsingular. 

Proof (For both results.) Given a matrix H, fix spaces of appropriate dimension 
for the domain and codomain. Fix bases for these spaces. With respect to these 
bases, H represents a map h. The statements are true about the map and 
therefore they are true about the matrix. QED 



4.4 Lemma A product of invertible matrices is invertible: if G and H are invertible 
and if GH is defined then GH is invertible and (GH)^^ — H^^ G^^ . 

Proof Because the two matrices are invertible they are square. Because their 
product is defined they must be square of the same dimension, nxn. So by fixing 
a basis for — we can use the standard basis — we get maps g,h.: — >■ 
that are associated with the matrices, G — Repg^ (g) and H — Repg^ (h). 

Consider By the prior paragraph this composition is defined. This 

map is a two-sided inverse of gh. since ChT^ g^^ ) (gh-) — h-^^ (id)h. — 'hr^ h. = id 
and (gh.)(h^^ g^^ ) — g(id)g^^ = 99^ = id- the matrices representing the maps 
reflect this equality. QED 
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This is the arrow diagram giving the relationship between map inverses and 
matrix inverses. It is a special case of the diagram for function composition and 
matrix multiplication. 



Wwrt 



c 



Vu;rt B Y y Vyrrt B 

Beyond its place in our general program of seeing how to represent map 
operations, another reason for our interest in inverses comes from solving linear 
systems. A linear system is equivalent to a matrix equation, as here. 



Xl + X2 = 3 

2xi — X2 = 2 




(*) 



By fixing spaces and bases (for instance, ]R^,]R^ with the standard bases), we 
take the matrix H to represent a map h. The matrix equation then becomes 
this linear map equation. 

h(x) = d (**) 

Asking for a solution to (*) is the same as asking in (**) for the domain vector x 

that h maps to the result d . If we had a left inverse map g then we could apply 
it to both sides g o h(x) — g(d), which simplifies to x = g(d). In terms of the 
matrices, we multiply Rep(;_B(g) • Repc(d) to get RepB(x]. 

4.5 Example We can find a left inverse for the matrix just given 





by using Gauss's Method to solve the resulting hnear system. 

m + 2n =1 
m— n =0 

p + 2q =0 

p- q = i 

Answer: m = 1 /3, ti = 1 /3, p = 2/3, and q = — 1 /3. This matrix is actually the 
two-sided inverse of H; the check is easy. With it we can solve the system (*) 
above. 

/x\ /l/3 V3\/3\ /5/3\ 

-Ml) \ i) 

4.6 Remark Why do this when we have Gauss's Method? Beyond the conceptual 
appeal of representing the map inverse operation, solving linear systems this 
way has at least two advantages. 
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First, once we have done the work of finding an inverse then solving a system 
with the same coefficients but different constants is fast: if we change the entries 
on the right of the system (*) then we get a related problem 





that our inverse method solves quickly. 

/x\ /l/3 ^/3\(5 
[y) [2/3 -Ml) 1^1 

Another advantage of inverses is that we can explore a system's sensitivity 
to changes in the constants. For example, tweaking the 3 on the right of the 
system (*) to 

e (::) ^ (i 

and solving with the inverse 

' V3)(3.01) + (l/3)(2)\ 
{2/3)(3.01)-(l/3](2)y 

shows that the first component of the solution changes by 1 /3 of the tweak, 
while the second component moves by 2/3 of the tweak. This is sensitivity 
analysis. For instance, we could use it to decide how accurately we must specify 
the data in a linear model to ensure that the solution has a desired accuracy. 

We finish by describing the computational procedure that we shall use to 
find the inverse matrix. 




4.7 Lemma A matrix H is invertible if and only if it can be written as the product 
of elementary reduction matrices. We can compute the inverse by applying 
to the identity matrix the same row steps, in the same order, as we use to 
Gauss- Jordan reduce H. 

Proof The matrix H is invertible if and only if it is nonsingular and thus 
Gauss- Jordan reduces to the identity. By Corollary 3.23 we can do this reduction 
with elementary matrices. 

R,-Rr-i ...Ri -H^I (*) 

For the first sentence of the result, note that elementary matrices are invertible 
(because elementary row operations are reversible) and that their inverses are 
also elementary. Apply R^^ from the left to both sides of (*). Then apply R;^^^ , 
etc. The result gives H as the product of elementary matrices H = R^^ • • • R^^ - I 
(the I here covers the trivial r = case). 

For the second sentence, rewrite (*) as (R^ • Rr-i . . . Ri ) • H = I to recognize 
that H^^ = Rr • Rr-i . . . Ri - I. Restated, applying Ri to the identity, followed 
by R2, etc., yields the inverse of H. QED 
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4.8 Example To find the inverse of 



we do Gauss- Jordan reduction, meanwhile performing the same operations on 
the identity. For clerical convenience we write the matrix and the identity 
side-by-side, and do the reduction steps together. 




-2pi +P2 



-1/3P2 



-P2+P1 





1 


1 







-3 


-2 


;) 




1 




1 








1 


2/3 


-1/3 







1/3 


1/3 





1 


2/3 


-1/3 



This calculation has found the inverse. 




4.9 Example This one happens to start with a row swap. 



3 

-1 











1 







PlJ^2 


(i 


3 


-1 


1 









-1 








^) 




/I 





1 





1 


-Pi +P3 





3 


-1 


1 









-1 


-1 





-1 1 









1/4 


1/4 


3/4 


1 





1/4 


1/4 


-1/4 





1 


-1/4 


3/4 


-3/4 



4.10 Example We can detect a non-invertible matrix when the left half won't 
reduce to the identity. 



'1 1 1 0\ -2p,+p2 / II 10' 

,22 lj ~^ loo -2 1, 



With this procedure we can give a formula for the inverse of a general 2x2 
matrix, which is worth memorizing. But larger matrices have more complex 
formulas so we will wait for more explanation in the next chapter. 
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4.11 Corollary The inverse for a 2x2 matrix exists and equals 

1 / d -b\ 




ad — be \ — c a 



if and only if ad — be ^ 0. 



Proof This computation is Exercise 21. QED 

We have seen here, as in the Mechanics of Matrix Multiplication subsection, 
that we can exploit the correspondence between linear maps and matrices. So 
we can fruitfully study both maps and matrices, translating back and forth to 
whichever helps the most. 

Over this whole section we have developed an algebra system for matrices. 
We can compare it with the familiar algebra system for the real numbers. Here 
we are working not with numbers but with matrices. We have matrix addition 
and subtraction operations, and they work in much the same way as the real 
number operations, except that they only combine same-sized matrices. We 
have scalar multiplication, which is in some ways another extension of real 
number multiplication. We also have a matrix multiplication operation and 
a multiplicative inverse. These operations are somewhat like the familiar real 
number ones (associativity, and distributivity over addition, for example), but 
there are differences (failure of commutativity). This matrix system provides an 
example that algebra systems other than the elementary real number system 
can be interesting and useful. 

Exercises 

4.12 Supply the intermediate steps in Example 4.9. 
/ 4.13 Use Corollary 4.11 to decide if each matrix has an inverse. 

;) c 4) <=) (J 1 

/ 4.14 For each invertible matrix in the prior problem, use Corollary 4.11 to find its 
inverse. 

/ 4.15 Find the inverse, if it exists, by using the Gauss- Jordan Method. Check the 
answers for the 2x2 matrices with Corollary 4.11. 

(o \) (3 1 

/O 1 5\ /2 2 

(e) -2 4 (f) 1 -2 
V2 3 -2/ \4 -2 

/ 4.16 What matrix has this one for its inverse? 

'1 3^ 




y2 5 J 

4.17 How does the inverse operation interact with scalar multiplication and addition 
of matrices? 
(a) What is the inverse of rH? 
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(b) Is (H + G]-' = H-i + G-i? 
/ 4.18 Is (T'')-' = {T-')''? 

4.19 Is invertible? 

4.20 For each real number 9 let te : — > be represented with respect to the 
standard bases by this matrix. 

^cos 6 — sin 9^ 
^sinB cos 6 

Show that te, +02 ="^6, -tej- Show also that te^' =t_e. 

4.21 Do the calculations for the proof of Corollary 4.11. 

4.22 Show that this matrix 

has infinitely many right inverses. Show also that it has no left inverse. 

4.23 In the review of inverses example, starting this subsection, how many left 
inverses has i? 

4.24 If a matrix has infinitely many right-inverses, can it have infinitely many 
left-inverses? Must it have? 

4.25 Assume that g: V — > W is linear. One of these is true, the other is false. Which 
is which? 

(a) If f : W ^> V is a left inverse of g then f must be linear. 

(b) If f : W — !> V is a right inverse of g then f must be linear. 

/ 4.26 Assume that H is invertible and that HG is the zero matrix. Show that G is a 
zero matrix. 

4.27 Prove that if H is invertible then the inverse commutes with a matrix GH^^ — 
H^^ G if and only if H itself commutes with that matrix GH = HG. 
/ 4.28 Show that if T is square and if T** is the zero matrix then (I— T)^' = I+T+T^+T^. 
Generalize. 

/ 4.29 Let D be diagonal. Describe D^, D^, . . . , etc. Describe D^^, . . . , etc. 

Define D" appropriately. 

4.30 Prove that any matrix row-equivalent to an invertible matrix is also invertible. 

4.31 The first question below appeared as Exercise 29. 

(a) Show that the rank of the product of two matrices is less than or equal to the 
minimum of the rank of each. 

(b) Show that if T and S are square then TS = I if and only if ST — I. 

4.32 Show that the inverse of a permutation matrix is its transpose. 

4.33 The first two parts of this question appeared as Exercise 26. 

(a) Show that (GH)"^ = VJC^. 

(b) A square matrix is symmetric if each i, j entry equals the j,t entry (that is, 
if the matrix equals its transpose). Show that the matrices HH^ and H^H are 
symmetric. 

(c) Show that the inverse of the transpose is the transpose of the inverse. 

(d) Show that the inverse of a symmetric matrix is symmetric. 
/ 4.34 The items starting this question appeared as Exercise 31. 

(a) Prove that the composition of the projections 7Tx,7ty : K'^ — > is the zero 
map despite that neither is the zero map. 

(b) Prove that the composition of the derivatives d^/dx^, d^/dx'^ '.Vn ^ is the 
zero map despite that neither map is the zero map. 
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(c) Give matrix equations representing each of the prior two items. 
When two things multiply to give zero despite that neither is zero, each is said to 
be a zero divisor. Prove that no zero divisor is invertible. 

4.35 In real number algebra, there are exactly two numbers, 1 and —1 , that are 
their own multiplicative inverse. Does = I have exactly two solutions for 2x2 
matrices? 

4.36 Is the relation 'is a two-sided inverse of transitive? Reflexive? Symmetric? 

4.37 [Am. Math. Mon., Nov. 1951] Prove: if the sum of the elements of a square 
matrix is k, then the sum of the elements in each row of the inverse matrix is 1 /k. 
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V Change of Basis 



Representations vary with the bases. For instance, ei G M'^ has two different 
representations 

Rep£,(e,) = Q RepB(ei) = ^^j^^) 
with respect to the standard basis and this one. 

-(:)•('.)• 

The same is true for maps; with respect to the basis pairs £.2) £2 and fii) B, the 
identity map has two different representations. 

Rep..,e.(id)=(j ?) ^^ep,„3ad).(;/^ J/^) 

With our point of view that the objects of our studies are vectors and maps, by 
fixing bases we are adopting a scheme of tags or names for these objects that 
are convenient for calculations. We will now see how to translate among these 
names, so we will see exactly how the representations vary as the bases vary. 



V.l Changing Representations of Vectors 

In converting Repg (v] to Rep^ (v) the underlying vector v doesn't change. Thus, 
this translation is accomplished by the identity map on the space, described 
so that the domain space vectors are represented with respect to B and the 
codomain space vectors are represented with respect to D. 

Vturt B 
id 
^wrt D 

(The diagram is vertical to fit with the ones in the next subsection.) 

1.1 Definition The change of basis matrix for bases B, D C V is the representa- 
tion of the identity map id: V — > V with respect to those bases. 



RepB,D(id) = 



Repo(Pi; 



V 



RepodSr 



238 



Chapter Three. Maps Between Spaces 



1.2 Remark Perhaps a better name would be 'change of representation matrix' 
but this one is standard. 



1.3 Lemma Left-multiplication by the change of basis matrix for B, D converts 
a representation with respect to B to one with respect to D. Conversely, if 
left-multiplication by a matrix changes bases M • Repg (v) = RepQ (v) then M is 
a change of basis matrix. 



Proof The first sentence holds because matrix-vector multiplication represents 
a map application Repg q (id) • Repg (v) — Rep^ ( id(v) ) — Repp (v) for each v. 
For the second sentence, with respect to B, D the matrix M represents a linear 
map whose action is to map each vector to itself, and is therefore the identity 
map. QED 

1.4 Example With these bases for 



p2 



because 

the change of basis matrix is this. 



Repenfid) = 



-1/2 -1/2^ 
3/2 1/2, 



For instance, if we finding the representations of 62 
then the matrix will do the conversion. 

/-1/2 -i/2\ / A ^(y2\ 

[ 3/2 \\/2) 

We finish this subsection by recognizing that the change of basis matrices 
form a familiar set. 



1.5 Lemma A matrix changes bases if and only if it is nonsingular. 

Proof For the 'only if direction, if left-multiplication by a matrix changes 
bases then the matrix represents an invertible function, simply because we can 
invert the function by changing the bases back. Such a matrix is itself invertible, 
and so is nonsingular. 
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To finish we will show that any nonsingular matrix M performs a change of 
basis operation from any given starting basis B to some ending basis. Because the 
matrix is nonsingular it will Gauss- Jordan reduce to the identity. If the matrix 
is the identity I then the statement is obvious. Otherwise there are elementary 
reduction matrices such that Rr • • • Ri • M = I with r ^ 1 . Elementary matrices 
are invertible and their inverses are also elementary so multiplying both sides of 
that equation from the left by Rt^\ then by Rr-i , etc., gives M as a product 
of elementary matrices M = Ri^^ •••Rr^^. (We've combined R^^^ I to make 
Rr^^ ; because r ^ 1 we can always make the I disappear in this way, which we 
need to do because it isn't an elementary matrix.) 

Thus, we will be done if we show that elementary matrices change a given 
basis to another basis, for then Rr^^ changes B to some other basis Br, and 
Rr-i^' changes Br to some Br-i, etc., and the net effect is that M changes B 
to Bi . We will prove this by covering the three types of elementary matrices 
separately; here are the three cases. 



Ci 



\CnJ 



^c^\ 



kci 



Ci 



VJ 



/ci\ 



Ci 



Ci,j(k) 



kCi + Cj 



Vj 



V 



Applying a row- multiplication matrix Mi(k) changes a representation with 
respect to (Pi , . . . , (3i, . . . , pn) to one with respect to (|3i , . . . , (1/k]|3i, . . . , (3ri)- 



V = Ci • Pi -I h Ci • Pi H + Cn • Pn 

^ ci • Pi +--- + kci- (l/k)Pi 



Cn • Pn V 



We can easily see that the second one is a basis, given that the first is a basis 
amd that k is a restriction in the definition of a row-multiplication matrix. 
Similarly, left-multiplication by a row-swap matrix Pij changes a representation 
with respect to the basis (Pi , . . . , Pi, . . . , Pj, . . . , Pn) into one with respect to 
this basis (Pi , . . . , Pj, . . . , Pi, . . . , Pn). 

V = Ci - Pi -I h Ci • Pi H h Cj Pj H + Cn • Pn 

Ci • Pi -I h Cj • Pj -I 1- Ci • Pi -I 1- Cn • Pn = V 

And, a representation with respect to (Pi , . . . , Pi, . . . , Pj , . . . , Pn) changes via 
left-multiplication by a row-combination matrix Cij (k) into a representation 
with respect to (Pi , . . . , Pi - kPj, . . . , Pj, . . . , Pn) 

V = Ci - Pi -I h Ci • Pi -h Cj Pj H h Cn • Pn 

Ci • Pi -h Ci - (Pi-kPj) -h-- - -h (kCi-hCj) • Pj -I- Cn • Pn = v 
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(the definition of reduction matrices specifies that i 7^ j and k ^ 0). QED 

1.6 Corollary A matrix is nonsingular if and only if it represents the identity map 
with respect to some pair of bases. 

In the next subsection we will see how to translate among representations 
of maps, that is, how to change Repg 13(h) to Repg ^(h,). The above corollary 
is a special case of this, where the domain and range are the same space, and 
where the map is the identity map. 

Exercises 

/ 1.7 In E^, where 

-<(0'(1)> 

find the change of basis matrices from D to £2 and from £2 to D. Multiply the 
two. 

/ 1.8 Find the change of basis matrix for B, D C R^. 

(a) B = £2, D = (62, e,) (b) B = £2, D = (Qj , Q) 

,.,BH(;),(:),,o.e. -)-<(-;) ^(^>.-<C).(j> 

1.9 For the bases in Exercise 8, find the change of basis matrix in the other direction, 
from D to B. 

/ 1.10 Find the change of basis matrix for each B,D C Ti- 

(a) B = (1,x,x2),D = (x^^,x) (b) B = (l,x,x2),D = (1,1 +x,1 +x + x2) 

(c) B = (2,2x,x2),D = (1 +x^,^ -x^x + x^) 
/ 1.11 Decide if each changes bases on R^. To what basis is £2 changed? 

(0 s) (3 ;) (1 -t) c 

1.12 Find bases such that this matrix represents the identity map with respect to 
those bases. 

P ') 

\0 4/ 

1.13 Consider the vector space of real-valued functions with basis (sin{x),cos(x)). 
Show that (2sin(x) +cos{x),3cos(x)) is also a basis for this space. Find the change 
of basis matrix in each direction. 

1.14 Where does this matrix 

^cos{2e) sin(2e) 
^sin(2e) -cos{2e)y 

send the standard basis for R^? Any other bases? Hint. Consider the inverse. 
/ 1.15 What is the change of basis matrix with respect to B, B? 

1.16 Prove that a matrix changes bases if and only if it is invertible. 

1.17 Finish the proof of Lemma 1.5. 

/ 1.18 Let H be an nxn nonsingular matrix. What basis of R*^ does H change to the 
standard basis? 
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/ 1.19 (a) In^j withbasis B = (1+x, 1 



-X, x^+x-^, x^- 



RepB(l -x + 3x^ -x^) = 



we have this representation. 

1 
1 

\2j 



Find a basis D giving this different representation for the same polynomial. 

n\ 


2 
\0j 



Repo(1 - x + 3x^ -x^) = 



(b) State and prove that we can change any nonzero vector representation to any 
other. 

Hint. The proof of Lemma 1.5 is constructive — it not only says the bases change, 

it shows how they change. 
1.20 Let V, W be vector spaces, and let B, B be bases for V and D,D be bases for 

W. Where h: V ^> VV is linear, find a formula relating Repg o{h.] to Repg f){h]. 
/ 1.21 Show that the columns of an nx n change of basis matrix form a basis for 

R^. Do all bases appear in that way: can the vectors from any R"^ basis make the 

columns of a change of basis matrix? 
/ 1.22 Find a matrix having this effect. 



That is, find a M that left-multiplies the starting vector to yield the ending vector. 
Is there a matrix having these two effects? 



(b) 



Give a necessary and sufficient condition for there to be a matrix such that V] i-7> W] 
and V2 1-^ ivi. 



V.2 Changing Map Representations 

The first subsection shows how to convert the representation of a vector with 
respect to one basis to the representation of that same vector with respect to 
another basis. Here we will see how to convert the representation of a map with 
respect to one pair of bases to the representation of that map with respect to a 
different pair, how to change Repg ^(h.) to Repg ^(H). 

That is, we want the relationship between the matrices in this arrow diagram. 

Vturt B — >■ VVujrt D 



id 



id 



1 



^wrt B : ^ ^wrt D 
H 

To move from the lower-left of this diagram to the lower-right we can either 
go straight over, or else up to Vb then over to Wq and then down. So we 
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can calculate H = Repg either by simply using B and D, or else by first 
changing bases with Repg ^(id) then multiplying by H = Repg D(h.) and then 
changing bases with Rep,-, 5 (id). 
This equation summarizes. 



H = Repo f, (id) • H • Repg ^ (id) 



(*) 



(To compare this equation with the sentence before it, remember to read the 
equation from right to left because we read function composition from right to 
left and matrix multiplication represents composition.) 

2.1 Example The matrix 



cos(7t/6) — sin(7t/6) 
sin(7t/6) cos(7t/6) 



V3/2 -1/2^ 
1/2 V3/2, 



represents, with respect to £2, £2, the transformation t: ^ M'^ that rotates 
vectors 7t/6 radians counterclockwise. 

(-3 + V3)/2\ 
^+3V3]/2J 





We can translate that to a representation with respect to 

by using the arrow diagram and formula (*) above. 

t = Repg^ ,=,(id) • T • Repg £^(id) 



id 



^2 

wrt B 



id 
virt D 



Note that Repg^ 5(id) is the matrix inverse of Repf, £^(id). 



R-ePfi,6(t) 



2\ fV3/2 -]/2\ (-[ q\ 
^0 3) [ 1/2 V3/2) [\ 2) 

^f[5-V3)/6 (3 + 2V3)/3\ 

Although the new matrix is messier, the map that it represents is the same. For 
instance, to replicate the effect of t in the picture, start with B, 



RepB(l M) = 



1 
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apply T, 



'{5-x/3)/6 (3 + 2^/3)/3^ (] 



J1 +\/3)/6 
and check it against D 




(11 +3%/3)/6\ 



-3 + V3)/2\ 
and it gives the same outcome as above. 

2.2 Example We may make the matrix simpler by changing bases. On the 
map 

x + z 

is represented with respect to the standard basis in this way. 

/O 1 l' 

Represented with respect to 



B = 





gives a matrix that is diagonal. 



RepB,B(t] = 



V 



-1 











-1 














Naturally we usually prefer basis changes that make the representation easier 
to understand. We say that a map or matrix has been diagonalized when its 
representation is diagonal with respect to B,B, that is, with respect to equal 
starting and ending bases. In Chapter Five we shall see which maps and matrices 
are diagonalizable. In the rest of this subsection we consider the easier case 
where representations are with respect to B,D, which are possibly different 
starting and ending bases. Recall that the prior subsection shows that a matrix 
changes bases if and only if it is nonsingular. That gives us another version of 
the above arrow diagram and equation (*) from the start of this subsection. 



2.3 Definition Same-sized matrices H and H are matrix equivalent if there are 
nonsingular matrices P and Q such that H — PHQ. 
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2.4 Corollary Matrix equivalent matrices represent the satme map, with respect 
to appropriate pairs of bases. 

Exercise 19 checks that matrix equivalence is an equivalence relation. Thus 
it partitions the set of matrices into matrix equivalence classes. 



We can get some insight into the classes by comparing matrix equivalence with 
row equivalence (remember that matrices are row equivalent when they can 
be reduced to each other by row operations). In H = PHQ, the matrices P 
and Q are nonsingular and thus we can write each as a product of elementary 
reduction matrices (Lemma 4.7). Left-multiplication by the reduction matrices 
making up P has the effect of performing row operations. Right-multiplication 
by the reduction matrices making up Q performs column operations. Therefore, 
matrix equivalence is a generalization of row equivalence — two matrices are row 
equivalent if one can be converted to the other by a sequence of row reduction 
steps, while two matrices are matrix equivalent if one can be converted to the 
other by a sequence of row reduction steps followed by a sequence of column 
reduction steps. 

Thus, if matrices are row equivalent then they are also matrix equivalent 
(since we can take Q to be the identity matrix and so perform no column 
operations). The converse, however, does not hold: two matrices can be matrix 
equivalent but not row equivalent. 

2.5 Example These two 



are matrix equivalent because the second reduces to the first by the column 
operation of taking —1 times the first column and adding to the second. They 
are not row equivalent because they have different reduced echelon forms (in 
fact, both are already in reduced form). 

We will close this section by finding a set of representatives for the matrix 
equivalence classes.* 

2.6 Theorem Any mxn matrix of rank k is matrix equivalent to the mxn matrix 



* More information on class representatives is in the appendix. 



All matrices: 




H matrix equivalent 
to H 
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that is all zeros except that the first k diagonal entries are ones. 



/I ... 
1 ... 



... 1 
... 



\0 
















This is a block partial-identity form. 



Proof As discussed above, Gauss- Jordan reduce the given matrix and combine 
all the reduction matrices used there to make P. Then use the leading entries to 
do column reduction and finish by swapping columns to put the leading ones on 
the diagonal. Combine the reduction matrices used for those column operations 
into Q. QED 

2.7 Example We illustrate the proof by finding the P and Q for this matrix. 




First Gauss- Jordan row-reduce. 



/1 






1 









2 


1 





'=) 







1 


•) 








1 




i; 


l- 


-2 







V 


4 


2 





2 














1 
















Then column-reduce, which involves right-multiplication. 



1 2 
1 

.0 



/I 


-2 









(] 













1 













1 














1 













1 


1 










^) 




\0 








V 



1 0^ 
10 

0, 



Finish by swapping columns. 




f^ o\ 

10 

10 

\o 1 y 
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Finally, combine the left-multipliers together as P and the right-multipliers 
together as Q to get the PHQ equation. 





















o\ 





1 




















2.8 Corollary Two same-sized matrices are matrix equivalent if and only if they 
have the same rank. 

That is, the matrix equivalence classes are characterized by rank. 

Proof Two same-sized matrices with the same rank are equivalent to the same 
block partial-identity matrix. QED 

2.9 Example The 2x2 matrices have only three possible ranks: zero, one, or two. 
Thus there are three matrix-equivalence classes. 



All 2x2 matrices: 




Three equivalence 
classes 



Each class consists of all of the 2x2 matrices with the same rank. There is only 
one rank zero matrix so that class has only one member. The other two classes 
have infinitely many members. 

In this subsection we have seen how to change the representation of a map 
with respect to a first pair of bases to one with respect to a second pair. That 
led to a definition describing when matrices are equivalent in this way. Finally 
we noted that, with the proper choice of (possibly different) starting and ending 
bases, any map can be represented in block partial-identity form. 

One of the nice things about this representation is that, in some sense, we 
can completely understand the map when we express it in this way: if the bases 
are B = ((3i , . . . , (3n) and D = (6i , . . . , 8m) then the map sends 

CiPiH +Ck(3k + Ck+,Pic+i +---+Cn|3n ' — > CjSiH +Ck6k+0H \-0 

where k is the map's rank. Thus, we can understand any linear map as a kind 
of projection. 



Ck 
Ck+1 



Ck 





Of course, "understanding" a map expressed in this way requires that we under- 
stand the relationship between B and D. Nonetheless, this is a good classification 
of linear maps. 
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Exercises 

/ 2.10 Decide if these matrices are matrix equivalent. 

G 3 o)' (o 5 -I, 

(b) f4 o^ 



1 ^J \o 5 
0' (z -I, 

/ 2.11 Find the canonical representative of the matrix-equivalence class of each ma- 
trix. 

/O 1 2\ 
(b) 1 1 4 

\3 3 3 -1/ 



2 1 

4 2 



A / 1 



2.12 Suppose that, with respect to 

the transformation t : — > ffi^ is represented by this matrix. 

1 2^ 
3 4 

Use change of basis matrices to represent t with respect to each pair. 
/ 2.13 What sizes are P and Q in the equation H = PHQ? 

/ 2.14 Use Theorem 2.6 to show that a square matrix is nonsingular if and only if it 

is equivalent to an identity matrix. 
/ 2.15 Show that, where A is a nonsingular square matrix, if P and Q are nonsingular 

square matrices such that PAQ — 1 then QP = A^^ . 
/ 2.16 Why does Theorem 2.6 not show that every matrix is diagonalizable (see 

Example 2.2)? 

2.17 Must matrix equivalent matrices have matrix equivalent transposes? 

2.18 What happens in Theorem 2.6 if k = 0? 

/ 2.19 Show that matrix-equivalence is an equivalence relation. 

/ 2.20 Show that a zero matrix is alone in its matrix equivalence class. Are there 
other matrices like that? 

2.21 What are the matrix equivalence classes of matrices of transformations on R' ? 

R3? 

2.22 How many matrix equivalence classes are there? 

2.23 Are matrix equivalence classes closed under scalar multiplication? Addition? 

2.24 Let t: R"^ — >■ R'^ represented by T with respect to £^1 £n- 

(a) Find Repg g(t) in this specific case. 

-G -0 -<©■(:;> _ 

(b) Describe Repg g{t) in the general case where B — (pi , . . . , pn)- 

2.25 (a) Let V have bases Bi and B2 and suppose that W has the basis D. Where 
h: V — !> W, find the formula that computes Repg^ d(^) from Repg^ oW- 
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(b) Repeat the prior question with one basis for V and two bases for W. 
2.26 (a) If two matrices are matrix-equivalent and invertible, must their inverses 
be matrix-equivalent? 

(b) If two matrices have matrix-equivalent inverses, must the two be matrix- 
equivalent? 

(c) If two matrices are square and matrix-equivalent, must their squares be 
matrix-equivalent? 

(d) If two matrices are square and have matrix-equivalent squares, must they be 
matrix-equivalent? 

/ 2.27 Square matrices are similar if they represent the same transformation, but 
each with respect to the same ending as starting basis. That is, Repg^ g^ (t) is 
similar to Repg^ b2(^)- 

(a) Give a definition of matrix similarity like that of Definition 2.3. 

(b) Prove that similar matrices are matrix equivalent. 

(c) Show that similarity is an equivalence relation. 

(d) Show that if T is similar to T then is similar to T^, the cubes are similar, 
etc. Contrast with the prior exercise. 

(e) Prove that there are matrix equivalent matrices that are not similar. 
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VI Projection 

This section is optional. It is only required for the last two sections of 
Chapter Five. 

We have described the projection n from into its xy-plane subspace as a 
'shadow map'. This shows why, but it also shows that some shadows fall upward. 




So perhaps a better description is: the projection of v is the p in the plane with 
the property that someone standing on p and looking directly up or down sees 
V. In this section we will generalize this to other projections, both orthogonal 
and non-orthogonal. 



VI. 1 Orthogonal Projection Into a Line 

We first consider orthogonal projection of a vector v into a line I. This picture 
shows someone walking out on the line until they are at a point p such that 
the tip of V is directly above them, where "above" does not mean parallel to the 
y-axis but instead means orthogonal to the line. 




Since we can describe the line as the span of some vector £={c-s| ceK}, this 
person has found the coefficient cp with the property that v — c^s is orthogonal 

to CpS. 




To solve for this coefficient, observe that because v — c^s is orthogonal to a 
scalar multiple of s, it must be orthogonal to s itself. Then (v — c^s) • s = 
gives that = v • s/s • s. 
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1.1 Definition The orthogonal projection of v into the line spanned by a 
nonzero s is this vector. 

proj[r](v) ^f^-s 

1.2 Remarl< That definition says 'spanned by s ' instead the more formal 'the 
span of the set {s }'. This more casual phrase is common. 

1.3 Example To orthogonally project the vector (3) into the line y — 2x, we first 
pick a direction vector for the line. 




1.4 Example In M^, the orthogonal projection of a general vector 




into the ij-axis is 




which matches our intuitive expectation. 

The picture above with the stick figure walking out on the line until v's tip 
is overhead is one way to think of the orthogonal projection of a vector into a 
line. We finish this subsection with two other ways. 

1.5 Example A railroad car left on an east-west track without its brake is pushed 
by a wind blowing toward the northeast at fifteen miles per hour; what speed 
will the car reach? 
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For the wind we use a vector of length 15 that points toward the northeast. 



The car is only affected by the part of the wind blowing in the east-west 
direction — the part of v in the direction of the x-axis is this (the picture has 
the same perspective as the railroad car picture above). 



So the car will reach a velocity of 15-^/1 /2 miles per hour toward the east. 




Thus, another way to think of the picture that precedes the definition is that 
it shows V as decomposed into two parts, the part p with the line, and the part 
that is orthogonal to the line (shown above on the north-south axis). These 
two are non-interacting in the sense that the east-west car is not at all affected 
by the north-south part of the wind (see Exercise 11). So we can think of the 
orthogonal projection of v into the line spanned by s as the part of v that lies 
in the direction of s. 

Still another useful way to think of orthogonal projection is to have the 
person stand not on the line but on the vector. This person holds a rope with a 
loop over the line £. 




When they pull the rope tight, the loop slides on (. until the rope is orthogonal 
to that line. That is, we can think of the projection p as being the vector in 
the line that is closest to v (see Exercise 17). 

1.6 Example A submarine is tracking a ship moving along the line y — 3x + 2. 
Torpedo range is one-half mile. If the sub stays where it is, at the origin on the 
chart below, will the ship pass within range? 



The formula for projection into a line does not immediately apply because the 
line doesn't pass through the origin, and so isn't the span of any s. To adjust 
for this, we start by shifting the entire map down two units. Now the line is 
y — 3x, a subspace. We project to get the point p on the line closest to 





north 




north 



east 
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the sub's shifted position. 




The distance between v and p is about 0.63 miles. The ship will not be in range. 

This subsection has developed a natural projection map, orthogonal projec- 
tion into a line. As suggested by the examples, we use it often in applications. 
The next subsection shows how the definition of orthogonal projection into a line 
gives us a way to calculate especially convenient bases for vector spaces, again 
something that we often see in applications. The final subsection completely 
generalizes projection, orthogonal or not, into any subspace at all. 

Exercises 

/ 1.7 Project the first vector orthogonally into the line spanned by the second vec- 
tor. 

/ 1.8 Project the vector orthogonally into the line. 

(a) ^-1 j , {c ^ lj|ceE} (b) (^^Ij, the line y =3x 

1.9 Although pictures guided our development of Definition 1.1, we are not restricted 
to spaces that we can draw. In project this vector into this line. 

1 




r\ 

2 
1 

V3/ 



{ ={c- 



1 



c e . 



V 1/ 

Consider the transformation of 



Apply it to these vec- 



(a) 



/ 1.10 Definition 1.1 uses two vectors s and v. 
resulting from fixing 

J. 

and projecting v into the line that is the span of s. 
tors. 

Show that in general the projection transformation is this. 

yXiJ ^(3X1 +9X2)/I0y 

Express the action of this transformation with a matrix. 

1.11 Example 1.5 suggests that projection breaks v into two parts, proj[5-j(v) and 
V — projij] (v ), that are non- interacting. Recall that the two are orthogonal. Show 
that any two nonzero orthogonal vectors make up a linearly independent set. 

1.12 (a) What is the orthogonal projection of v into a line if v is a member of that 
line? 
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(b) Show that if v is not a member of the line then the set {v, v — proj[j] (v ) } is 
hnearly independent. 

1.13 Definition 1.1 requires that s be nonzero. Why? What is the right definition 
of the orthogonal projection of a vector into the (degenerate) line spanned by the 
zero vector? 

1.14 Are all vectors the projection of some other vector into some line? 

/ 1.15 Show that the projection of v into the line spanned by s* has length equal to 
the absolute value of the number v • s divided by the length of the vector s . 

1.16 Find the formula for the distance from a point to a line. 

1.17 Find the scalar c such that the point (csi , csi) is a minimum distance from the 
point (vi,V2) by using calculus (i.e., consider the distance function, set the first 
derivative equal to zero, and solve). Generalize to K"^. 

/ 1.18 Prove that the orthogonal projection of a vector into a line is shorter than the 
vector. 

/ 1.19 Show that the definition of orthogonal projection into a line does not depend 
on the spanning vector: if s is a nonzero multiple of cf then (v • s/s • s*) • ? equals 

{v.cr/cf.(|).cf. 

/ 1.20 Consider the function mapping the plane to itself that takes a vector to its 
projection into the line y = x. These two each show that the map is linear, the first 
one in a way that is coordinate-bound (that is, it fixes a basis and then computes) 
and the second in a way that is more conceptual. 

(a) Produce a matrix that describes the function's action. 

(b) Show that we can obtain this map by first rotating everything in the plane 
7t/4 radians clockwise, then projecting into the x-axis, and then rotating 7t/4 ra- 
dians counterclockwise. 

1.21 For a, b e R'^ let V] be the projection of a into the line spanned by b, let vz be 
the projection of Vi into the line spanned by a, let V3 be the projection of V2 into 
the line spanned by b, etc., back and forth between the spans of a and b. That is, 
Vi+] is the projection of into the span of a if i + 1 is even, and into the span 
of b if i + 1 is odd. Must that sequence of vectors eventually settle down — must 
there be a sufficiently large i such that Vi+2 equals Vi and Vi+3 equals Vi+i? If so, 
what is the earliest such i? 



VI. 2 Gram-Schmidt Orthogonalization 



This subsection is optional. We only need the work done here in the final 
two sections of Chapter Five. Also, this subsection requires material from 
the previous subsection, which itself was optional. 

The prior subsection suggests that projecting into the line spanned by s 
decomposes a vector v into two parts 



v-proj|j,|(pJ 




V = proj [J] (v) + (v - proj (v)) 
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that are orthogonal and so are not-interacting. We will now develop that 
suggestion. 

2.1 Definition Vectors vi , . . . ,Vk £ are mutually orthogonal when any two 
are orthogonal: if i ^ j then the dot product Vi • vj is zero. 

2.2 Theorem If the vectors in a set {vj , . . . , Vk} C are mutually orthogonal 
and nonzero then that set is linearly independent. 

Proof Consider a linear relationship Ci vi +C2V2 + • • • + CkVic = 0. If i £ { 1 , .. , k} 
then taking the dot product of Vt with both sides of the equation 

Vi • (civi + C2V2 H h CkVk) = Vi • 

Ci • (vi - Vi] = 

shows, since Vi ^ 0, that Ci — 0. QED 

2.3 Corollary In a k dimensional vector space, if the vectors in a size k set are 
mutually orthogonal and nonzero then that set is a basis for the space. 

Proof Any linearly independent size k subset of a k dimensional space is a 
basis. QED 

Of course, the converse of Corollary 2.3 does not hold — not every basis of 
every subspace of M.^ has mutually orthogonal vectors. However, we can get 
the partial converse that for every subspace of there is at least one basis 
consisting of mutually orthogonal vectors. 

2.4 Example The members (3i and $2 of this basis for are not orthogonal. 



However, we can derive from B a new basis for the same space that does have 
mutually orthogonal members. For the first member of the new basis we simply 
use (3i . 




For the second member of the new basis, we subtract from |32 the part in the 
direction of ki . This leaves the part of P2 that is orthogonal to ki . 
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By the corollary (ki , K2) is a basis for M?. 

2.5 Definition An orthogonal basis for a vector space is a basis of mutually 
orthogonal vectors. 

2.6 Example To turn this basis for 




into an orthogonal basis we take the first vector as it is. 



Kl 



We get K2 by starting with pa and subtracting the part in the direction of ki . 

/0> 



K2 



Proj[K, 





We get K3 by taking and subtracting the part in the direction of ki and also 
the part in the direction of K2. 



K3 



proj[E 




-proj[E, 




Again, the corollary gives the result is a basis for 




2.7 Theorem (Gram-Schmidt orthogonalization) If (Pi , . . . (3k) is a basis for a sub- 
space of then the vectors 





= Pl 


K2 


= 02 


K3 


= 03 



Kk = 0k -proj[i;,](0k) Proj[i;^_,](0k) 

form an orthogonal basis for the same subspace. 
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2.8 Remark This is restricted to only because we have not given a definition 
of orthogonality for any other spaces. 

Proof We will use induction to check that each ki is nonzero, is in the span of 
(Pi , . . . Pi) , and is orthogonal to all preceding vectors K] • Ki = • • • = Ki_i • = 0. 
Then with Corollary 2.3 we will have that (ki , . . . K]^) is a basis for the same 
space as is (pi , . . . (3^). 

We shall only cover the cases up to i = 3, to give the sense of the argument. 
The remaining details are Exercise 25. 

The 1 = 1 case is trivial; setting ki equal to Pi makes it a nonzero vector 
since Pi is a member of a basis, it is obviously in the span of pi , and the 
'orthogonal to all preceding vectors' condition is satisfied vacuously. 

In the i — 2 case the expansion 

a ■ (a ^ a Pi • Kl ^ g P2 • Kl ^ 

K2 = p2 -Pr0j|g ][P2j = p2 - ^ — • Ki = P2 - ^ — ■ Pi 

' Ki • Kl Ki • Ki 

shows that K2 ^ or else this would be a non-trivial linear dependence among 
the p's (it is nontrivial because the coefficient of P2 is 1). It also shows that K2 
is in the span of the first two p's. And, kz is orthogonal to the only preceding 
vector 

i?, . K2 = Kl . (P2 -proj[|j,j(P2)) = 

because this projection is orthogonal. 

The 1 = 3 case is the same as the i = 2 case except for one detail. As in the 
i = 2 case, expand the definition. 

K2 

(P2-?^-Pl) 
Kl . Kl 

By the first line K3 7^ 0, since P3 isn't in the span [pi , P2] and therefore by the 
inductive hypothesis it isn't in the span [ki , K2]. By the second line K3 is in 
the span of the first three p's. Finally, the calculation below shows that K3 is 
orthogonal to ki . (There is a difference between this calculation and the one 
in the i — 2 case. Here the second line has two kinds of terms. As happened 
for i — 2, the first term is because this projection is orthogonal. But here 
the second term is because ki is orthogonal to K2 and so is orthogonal to any 
vector in the line spanned by K2.) 

Kl . K3 = Kl . (p3 -proj[g,](P3) -proj[|j^](P3]) 

= Kl • (p3 -proj[j,](P3]) - Kl •proj[g^](P3) 

= 

A similar check shows that K3 is also orthogonal to K2. QED 

Beyond having the vectors in the basis be orthogonal, we can also normalize 
each vector by dividing by its length, to end with an orthonormal basis.. 



a Ps • Kl ^ p3 • K2 

K3 = P3 - ^ — — ■ Kl - ^ — — 

Kl • Kl K2 • K2 

S Ps • Kl 5 p3 • K2 

= P3 — — ■ Pi 



Kl • Kl 



K2 • K2 
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2.9 Example Prom the orthogonal basis of Example 2.6, normalizing produces 
this orthonormal basis. 

/l/%/3\ 
( 1/%/3 , 

Besides its intuitive appeal, and its analogy with the standard basis £n for 
W^, an orthonormal basis also simplifies some computations. An example is in 
Exercise 19. 

Exercises 

2.10 Perform the Gram-Schmidt process on each of these bases for M^. 

•■•<(;). G)> «'G)'n)> 

Then turn those orthogonal bases into orthonormal bases. 
/ 2.11 Perform the Gram-Schmidt process on each of these bases for M.^ . 

Then turn those orthogonal bases into orthonormal bases. 
/ 2.12 Find an orthonormal basis for this subspace of R^: the plane x — y + z = 0. 

2.13 Find an orthonormal basis for this subspace of . 

/"A 
y 

Vw/ 

2.14 Show that any linearly independent subset of R"- can be orthogonalized without 
changing its span. 

2.15 What happens if we try to apply the Gram-Schmidt process to a finite set that 
is not a basis? 

/ 2.16 What happens if we apply the Gram-Schmidt process to a basis that is already 
orthogonal? 

2.17 Let (K] , . . . , K]^) be a set of mutually orthogonal vectors in W^. 

(a) Prove that for any v in the space, the vector v— (proj^ij^ j (v ) + • • • +projp^j (v )) 
is orthogonal to each of K] , . . . , i?]^ . 

(b) Illustrate the prior item in by using e^ as K] , using ez as K2, and taking v 
to have components 1,2, and 3. 

(c) Show that proj [k , j (v ) + • • ■ + proj [v,^] ) is the vector in the span of the set of 
k's that is closest to v. Hint. To the illustration done for the prior part, add a 
vector di ki + daKi and apply the Pythagorean Theorem to the resulting triangle. 

2.18 Find a vector in that is orthogonal to both of these. 

Ci) 

/ 2.19 One advantage of orthogonal bases is that they simplify finding the representa- 
tion of a vector with respect to that basis, 
(a) For this vector and this non-orthogonal basis for 

3) -<(;),Q> 



{ 



X — y— z + w = and x + z = 0} 
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first represent the vector with respect to the basis. Then project the vector into 
the span of each basis vector [pi] and [^zl- 
(b) With this orthogonal basis for 



represent the same vector v with respect to the basis. Then project the vector 
into the span of each basis vector. Note that the coefficients in the representation 
and the projection are the same. 

(c) Let K = (ki , . . . , K]c) be an orthogonal basis for some subspace of K*^. Prove 
that for any v in the subspace, the i-th component of the representation Repi< (v ) 
is the scalar coefficient (v • Ki)/(Ki • Ki) from proj[g.](v]. 

(d) Prove that v = proj[g, j (v ) H + proj[g^] (v ). 

2.20 Bessel's Inequality. Consider these orthonormal sets 

Bi={ei} B2={e,,e2} Bj = {ei , ei, 63} 84 = {ei , 62, 63, 64} 
along with the vector v e R'* whose components are 4, 3, 2, and 1 . 

(a) Find the coefficient C] for the projection of v into the span of the vector in 
Bi. Check that ||vf ^ |c,|^. 

(b) Find the coefficients C] and C2 for the projection of v into the spans of the 
two vectors in B2. Check that ||v|| ^ |ci | + IC2I . 

(c) Find Ci , C2, and C3 associated with the vectors in 83, and Ci , C2, C3, and 
C4 for the vectors in 84. Check that ||v|| J? |ci | +--- + |C3| andthat||v|| ^ 

|C,|"+--- + |C4|'. 

Show that this holds in general: where { ki , . . . , K)^ } is an orthonormal set and Ct is 
coefficient of the projection of a vector V from the space then ||v II J? |ci | +--- + |cicl • 
Hint. One way is to look at the inequality ^ ||v — (ci K] + • • • + C]<Kk)|| and 
expand the c's. 

2.21 Prove or disprove: every vector in E'^ is in some orthogonal basis. 

2.22 Show that the columns of an nxn matrix form an orthonormal set if and only 
if the inverse of the matrix is its transpose. Produce such a matrix. 

2.23 Does the proof of Theorem 2.2 fail to consider the possibility that the set of 
vectors is empty (i.e., that k = 0)? 

2.24 Theorem 2.7 describes a change of basis from any basis B — (Pi , . . . , P]<) to 
one that is orthogonal K — (Ki,...,Kk). Consider the change of basis matrix 
Repg K(id)- 

(a) Prove that the matrix Rep,^ g (id) changing bases in the direction opposite to 
that of the theorem has an upper triangular shape — all of its entries below the 
main diagonal are zeros. 

(b) Prove that the inverse of an upper triangular matrix is also upper triangular 
(if the matrix is invertible, that is). This shows that the matrix Repg Y,{id) 
changing bases in the direction described in the theorem is upper triangular. 

2.25 Complete the induction argument in the proof of Theorem 2.7. 



VI. 3 Projection Into a Subspace 

This subsection is optional. It also uses material from the optional earlier 
subsection on Combining Subspaces. 
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The prior subsections project a vector into a line by decomposing it into two 
parts: the part in the line proj|3-|(v) and the rest v — pro j [^.j (v ] . To generalize 
projection to arbitrary subspaces we will follow this decomposition idea. 

3.1 Definition For any direct sum V = M © N and any v G V, the projection of 
V into M along N is 

projM,N{v) = m 
where v = rn. + n with rfi € M, ft e N. 

We can apply this definition in spaces where we don't have a ready definition 
of orthogonal. (Definitions of orthogonality for spaces other than the are 
perfectly possible but we haven't seen any in this book.) 

3.2 Example The space "Mzxi oi 2x2 matrices is the direct sum of these two. 
M = {(^^ Qj|a,beM} ^^|c,deM} 

To project 

into M along N, we first fix bases for the two subspaces. 
The concatenation of these 

-^"-«-<(: :).(: :)> 

is a basis for the entire space because Mzxi is the direct sum. So we can use it 
to represent A. 

t:)-(::)-(o:)-(:3-t :) 

The projection of A into M along N keeps the M part and drops the N part. 

P-jM,N((o l^-'-i^o o)+^-(o j) = (o j) 

3.3 Example Both subscripts on proj^y; ^(v) are significant. The first subscript 
M matters because the result of the projection is a member of M. For an 
example showing that the second one matters, fix this plane subspace of and 
its basis. 




M = { y |y-2z = 0} B 



M 
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Compare the projections along these (verification that = M ® N and . 
M ® N is routine). 



N ={k 



I I ke 



N ={k 




The projections are different because they have different effects on this vector. 



V = 



For the first one we find a basis for N 




and represent v with respect to the concatenation Bm Bn. 



1 



2 I =2- I I +1 • I 2 1 +4- I 
We find the projection of v into M along N by dropping the N component. 



1 



^0 



projM,N(v) = 2- 0+1-2 




For N, this basis is natural. 




Representing v with respect to the concatenation 



/2\ 






(o] 






2 


= 2- 


+ (9/5) • 


2 


- (8/5) • 


(i) 








v) 







and then keeping only the M part gives this. 



ProjM,N(v) = 2- 



I + (9/5) • I 2 




Therefore projection along different subspaces may yield different results. 

These pictures compare the two maps. Both show that the projection is 
indeed 'into' the plane and 'along' the line. 
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Notice that the projection along N is not orthogonal — there are members of the 
plane M that are not orthogonal to the dotted line. But the projection along N 
is orthogonal. 

A natural question is: what is the relationship between the projection op- 
eration defined above, and the operation of orthogonal projection into a line? 
The second picture above suggests the answer — orthogonal projection into a 
line is a special case of the projection defined above; it is just projection along a 
subspace perpendicular to the line. 




3.4 Definition The orthogonal complement of a subspace M of is 

= {v e I V is perpendicular to all vectors in M} 

(read "M perp"). The orthogonal projection proj^(v) of a vector is its projec- 
tion into M along M^. 

3.5 Example In , to find the orthogonal complement of the plane 



P = { I y I I 3x + 2y - z = 0} 
z / 



we start with a basis for P. 




B = 



Any V perpendicular to every vector in B is perpendicular to every vector in the 
span of B (the proof of this is Exercise 19). Therefore, the subspace P-'- consists 
of the vectors that satisfy these two conditions. 
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We can express those conditions more compactly as a linear system. 




We are thus left with finding the null space of the map represented by the matrix, 
that is, with calculating the solution set of a homogeneous linear system. 



vi + 3v3 = 
V2 + 2V3 = 




keM} 



Instead of the term orthogonal complement, this is sometimes called the line 
normal to the plane. 

3.6 Example Where M is the xy-plane subspace of M^, what is M-'-? A common 
first reaction is that M-'- is the yz-plane, but that's not right. Some vectors 
from the yz-plane are not perpendicular to every vector in the xij-plane. 




, 1 -0 + 1 ■3 + 0-2 



0.94 rad 



Instead M.-'" is the z-axis, since proceeding as in the prior example and taking 
the natural basis for the xy-plane gives this. 

/x\ /.X /x^ 




X = and y = 0} 



3.7 Lemma If M is a subspace of then orthogonal complement is also a 
subspace. The space is the direct sum of the two — M® M^. And, for any 
V e M."-, the vector v — projf^4(v) is perpendicular to every vector in M. 

Proof First, the orthogonal complement M-'- is a subspace of M"- because, as 
noted in the prior two examples, it is a null space. 

Next, start with any basis Bm — , . . . , flk) for M and expand it to a basis 
for the entire space. Apply the Gram-Schmidt process to get an orthogonal basis 
K = (i<i , . . . , Kn) for R"-. This K is the concatenation of two bases: (ki , . . . , k^) 
with the same number of members k as Bm, and (kic+i , • • • , Kn)- The first is a 
basis for M so if we show that the second is a basis for M-'- then we will have 
that the entire space is the direct sum of the two subspaces. 

Exercise 19 from the prior subsection proves this about any orthogonal 
basis: each vector v in the space is the sum of its orthogonal projections onto 
the lines spanned by the basis vectors. 



+ P™J[k„ 



(*) 
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To check this, represent the vector as v = Ti Ki + • • • + t^Ku, apply Kt to both 

sides V . Ki = (ti K] H h T^Kn) • Ki = Ti • + • • • + Ti • (Ki . Ki) + • • • + • 0, 

and solve to get ti = (v • Ki)/(Ki • Ki), as desired. 

Since obviously any member of the span of (kic+i , • • • > ^n) is orthogonal to 
any vector in M, to show that this is a basis for M-'- we need only show the 
other containment — that any w e M-'- is in the span of this basis. The prior 
paragraph does this. Any w G M-'- gives this on projections into basis vectors 
from M: proj[gj](w) =0,..., proj[g|^](w) — 0. Therefore equation (*) gives 
that w is a linear combination of ki^_|-i > • • • > ^n- Thus this is a basis for JVl^ and 

is the direct sum of the two. 

The final sentence of the statement of this result is proved in much the same 

way. Write v = proj[u,j(v) H h projf^^] (^7). Then proj^lv) keeps only the 

M part and dropping the part proj^tv) = proj|K^_^, j (v ) H hproj^ij^] (a7). 

Therefore v — proj^^fv ) consists of a linear combination of elements of M-*- and 
so is perpendicular to every vector in M. QED 

We can find the orthogonal projection into a subspace by following the steps 
of the proof but the next result gives a convenient formula. 

3.8 Theorem Let v be a vector in and let M be a subspace of with basis 
(Pi , . . . , Pk)- If A is the matrix whose columns are the p's then projjvi(v) — 
Ci (3i +• • •+Ckpk where the coefficients Ci are the entries of the vector (A''^A)^^ A""^- 
V. That is, proj^lv) = A(A"^A)"^ A"^ • v. 

Proof The vector projjvi(v) is a member of M and so it is a linear combination 
of basis vectors Ci • (3i + • • • + Ck • (3k- Since A's columns are the (3's, that 
can be expressed as: there is a c G R^ such that proj^(v) — Ac. The vector 
V — proj,yj(v) is perpendicular to each member of the basis so we have this. 



= A"^ (v - Ac) = A^v - A^Ac 



Solving for c (showing that A^A is invertible is an exercise) 



(aTa)"'aT.v 



gives the formula for the projection matrix as projjvi(v) = A • c. 
3.9 Example To orthogonally project this vector into this subspace 



QED 




first make a matrix whose columns are a basis for the subspace 



fo 1 



A = 



1 
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and then compute. 

1/2 -1/2^ 
1 
^-1/2 1/2y 

With the matrix, calculating the orthogonal projection of any vector into P is 
easy. 



projptvj 



1 











1 

















Note, as a check, that this result is indeed in P. 
Exercises 

/ 3.10 Project the vectors into M along N. 

(a) (^J), M = {Q|x + y=0}, N = { Q | -x - 2y = 0} 

(b) Q), M = {Q|x-y=0}, N={Q|2x + y=0} 

(c) , M = {0|x + y=O}, N={c-0|ceR} 

/ 3.11 Find M-^. 

(a) M = |x + y=0} (b) M = {Q |-2x + 3y=0} 

(c) M = {Q |x-y=0} {d)M = {0} (e)M = {Q|x = 0} 

(f) M = {0 |-x + 3y+z = 0} (g) M = {0 | x = and y + z = 0} 

3.12 This subsection shows how to project orthogonally in two ways, the method of 
Example 3.2 and 3.3, and the method of Theorem 3.8. To compare them, consider 
the plane P specified by 3x + 2y — z = in R-' . 

(a) Find a basis for P. 

(b) Find P^ and a basis for P^. 

(c) Represent this vector with respect to the concatenation of the two bases from 
the prior item. 

-0 

(d) Find the orthogonal projection of v into P by keeping only the P part from 
the prior item. 

(e) Check that against the result from applying Theorem 3.8. 
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/ 3.13 We have three ways to find the orthogonal projection of a vector into a line, 
the Definition 1.1 way from the first subsection of this section, the Example 3.2 
and 3.3 way of representing the vector with respect to a basis for the space and 
then keeping the M part, and the way of Theorem 3.8. For these cases, do all three 
ways. 

(a)v=( ; ), M = {(") |x + y=0} 




(b)v= 1 , M = { y x + z = Oandy=0} 



3.14 Check that the operation of Definition 3.1 is well-defined. That is, in Exam- 
ple 3.2 and 3.3, doesn't the answer depend on the choice of bases? 

3.15 What is the orthogonal projection into the trivial subspace? 

3.16 What is the projection of v into M. along N if v e M? 

3.17 Show that if M C is a subspace with orthonormal basis (ki , . . . , Kn) then 
the orthogonal projection of v into M is this. 

(V- K, ) • K, H h (V- Kn) • Kn 

/ 3.18 Prove that the map p: V V is the projection into M along N if and only 
if the map id— p is the projection into N along M. (Recall the definition of the 
difference of two maps: (id— p) (v) = id(v) — p{v) = v — p{v).) 

/ 3.19 Show that if a vector is perpendicular to every vector in a set then it is 
perpendicular to every vector in the span of that set. 

3.20 True or false: the intersection of a subspace and its orthogonal complement is 
trivial. 

3.21 Show that the dimensions of orthogonal complements add to the dimension of 
the entire space. 

/ 3.22 Suppose that vi,V2 £ are such that for all complements M, N C R^, the 
projections of Vj and V2 into M along N are equal. Must Vi equal V2? (If so, what 
if we relax the condition to: all orthogonal projections of the two are equal?) 

/ 3.23 Let M, N be subspaces of W^. The perp operator acts on subspaces; we can 
ask how it interacts with other such operations. 

(a) Show that two perps cancel: (M^)^ = M. 

(b) Prove that M C N implies that N-^ C M.-^. 

(c) Show that (M + N]^ = M-^ n N-^. 

/ 3.24 The material in this subsection allows us to express a geometric relationship 
that we have not yet seen between the range space and the null space of a linear 
map. 

(a) Represent f : K-^ ^ E given by 

V2 1-^ 1V, +2V2 +3V3 

with respect to the standard bases and show that 

C) 

is a member of the perp of the null space. Prove that ^(f)^ is equal to the 
span of this vector. 
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(b) Generalize that to apply to any f : 

(c) Represent f : ^ 



V2 ^ 



Iv, + 2V2 + 3V3 

4vi + 5v2 + 6v3 

VV3/ 

with respect to the standard bases and show that 




are both members of the perp of the null space. Prove that ,yK(f)^ is the span 
of these two. {Hint. See the third item of Exercise 23.) 
(d) Generalize that to apply to any f : E'^ ^ 
In [Strang 93] this is called the Fundamental Theorem of Linear Algebra 

3.25 Define a projection to be a linear transformation t: V ^> V with the property 
that repeating the projection does nothing more than does the projection alone: (to 
t){v)=t(v) forallve V. 

(a) Show that orthogonal projection into a line has that property. 

(b) Show that projection along a subspace has that property. 

(c) Show that for any such t there is a basis B = ((3] , . . . , pn) for V such that 

'Pi i = l,2,.. 
[0 i = r+1,r + 2,.. 

where r is the rank of t. 

(d) Conclude that every projection is a projection along a subspace. 

(e) Also conclude that every projection has a representation 

in block partial-identity form. 

3.26 A square matrix is symmetric if each i, j entry equals the j,i entry (i.e., if the 
matrix equals its transpose). Show that the projection matrix A(A^A)^'A^ is 
symmetric. [Strang 80] Hint. Find properties of transposes by looking in the index 
under 'transpose'. 



t(Pt) 




Line of Best Fit 



This Topic requires the formulas from the subsections on Orthogonal Pro- 
jection Into a Line and Projection Into a Subspace. 

Scientists are often presented with a system that has no solution and they 
must find an answer anyway. More precisely stated, they must find a best 
answer. 

For instance, this is the result of flipping a penny, including some intermediate 
numbers. 



In an experiment we can expect that samples will vary — here, sometimes the 
experimental ratio of heads to flips overestimates this penny's long-term ratio 
and sometimes it underestimates. So we expect that the system derived from 
the experiment has no solution. 



That is, the vector of data that we collected is not in the subspace where in 
theory we should find it. 



We have to do something, so we look for the m that most nearly works. An 
orthogonal projection of the data vector into the line subspace gives this best 
guess. 



number of flips 30 60 90 



number of heads 16 34 51 



30Ta= 16 
60m = 34 
90ra = 51 



/l6\ /30 

34 ^ {m 60 





268 



Chapter Three. Maps Between Spaces 



The estimate (m = 7110/12600 « 0.56) is a bit more than one half, but not 
much, so probably the penny is fair enough. 

The line with the slope m sa 0.56 is the line of best fit for this data. 

heads 
60 

30 



30 60 90 flips 



Minimizing the distance between the given vector and the vector used as the 
right-hand side minimizes the total of these vertical lengths, and consequently 
we say that the line comes from fitting by least-squares 



(we have exaggerated the vertical scale by ten to make the lengths visible). 

In the above equation the line must pass through (0,0), because we take it 
to be the line whose slope is this coin's true proportion of heads to flips. We 
can also handle cases where the line need not pass through the origin. 

For example, the different denominations of US money have different average 
times in circulation. [Federal Reserve] (The $2 bill is a special case because 
Americans mistakenly believe that it is collectible and do not circulate these 
bills.) How long should a $25 bill last? 



denomination 


1 


5 


10 20 


50 100 


average life (mos) 


22.0 


15.9 


18.3 24.3 


55.4 88.8 



The data plot below looks roughly linear. It isn't a perfect line, i.e., the linear 
system with equations b + Im = 1 .5, . . . , b + 100m — 20 has no solution, but 
we can again use orthogonal projection to find a best approximation. Consider 
the matrix of coefficients of that linear system and also its vector of constants, 
the experimentally-determined values. 



A = 



1\ 

5 

10 
20 
50 

100/ 



/22.0\ 
15.9 
18.3 
24.3 
55.4 

V88.8/ 



The ending result in the subsection on Projection into a Subspace says that 
coefficients b and m so that the linear combination of the columns of A is as close 
as possible to the vector v are the entries of (A^A)^^ A^ • v. Some calculation 
gives an intercept of b = 14.16 and a slope of m = 0.75. 
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life (yrs) 
8- 

4- 



10 ' 30 ' 50 ' 70 ' 90 ' denom 

Plugging X — 25 into the equation of the line shows that such a bill should last 
about two and three-quarters years. 

We close by considering the progression of world record times for the men's 
mile race [Oakley & Baker]. In the early 1900's many people wondered when 
this record would fall below the four minute mark. Here are the times that were 
in force on January first of each decade through the first half of that century. (Re- 
stricting ourselves to the times at the start of each decade reduces the data entry 
burden and gives much the same result. There are different sequences of times 
from competing standards bodies but these are from [Wikipedia Mens Mile].) 



year 


1870 


1880 1890 


1900 


1910 


1920 


1930 


1940 


1950 


sees 


268.8 


264.5 258.4 


255.6 


255.6 


252.6 


250.4 


246.4 


241.4 



We can use this to predict the date for 240 seconds, and we can then compare 
to the actual date. 

Sage gives the slope and intercept. 

sage: data=[ [1870 , 268 . 8] , [1880,264.5], [1890,258.4], 

: [1900,255.6], [1910,255.6], [1920,252.6], 

: [1930,250.4], [1940,246.4], [1950,241.4]] 

sage: varC slope, intercept') 
(slope, intercept) 
sage: model(x) = slope-x+intercept 
sage: fincLf it (data, model) 
[intercept == 837.0872267857003, 
slope == -0.30483333572258886] 

(People in the year didn't run very fast!) Plotting the data and the line 

sage: points(data) 

. . . . : +plot(model(intercept=find_fit(data, model) [0] .rhs() , 

. . . . : slope=find_fit(data, model) [1] .rhsO) , 

: (x, 1860, 1960) ,color='red' ,figsize=3,fontsize=7) 

gives this graph. 

270 
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Note that the progression is surprisingly linear. We predict 1958.73; the actual 
date of Roger Bannister's record was 1954-May-06. 

Exercises 

The calculations here are best done on a computer. Some of the problems require 
more data that is available in your library, on the Internet, or in the Answers 
to the Exercises. 
1 Use least-squares to judge if the coin in this experiment is fair. 



flips 


8 


16 


24 


32 


40 


heads 


4 


9 


13 


17 


20 



2 For the men's mile record, rather than give each of the many records and its exact 
date, we've "smoothed" the data somewhat by taking a periodic sample. Do the 
longer calculation and compare the conclusions. 

3 Find the line of best fit for the men's 1 500 meter run. How does the slope compare 
with that for the men's mile? (The distances are close; a mile is about 1609 meters.) 

4 Find the line of best fit for the records for women's mile. 

5 Do the lines of best fit for the men's and women's miles cross? 

6 (This illustrates that there are data sets for which a linear model is not right, 
and that the line of best fit doesn't in that case have any predictive value.) In a 
highway restaurant a trucker told me that his boss often sends him by a roundabout 
route, using more gas but paying lower bridge tolls. He said that New York State 
calibrates the toll for each bridge across the Hudson, playing off the extra gas to 
get there from New York City against a lower crossing cost, to encourage people to 
go upstate. This table, from [Cost Of Tolls] and [Google Maps], lists for each toll 
crossing of the Hudson River, the distance to drive from Times Square in miles 
and the cost in US dollars for a passenger car (if a crossings has a one-way toll 
then it shows half that number). 



Crossing 


Distance 


Toll 


Lincoln Tunnel 


2 


6.00 


Holland Tunnel 


7 


6.00 


George Washington Bridge 


8 


6.00 


Verrazano-Narrows Bridge 


16 


6.50 


Tappan Zee Bridge 


27 


2.50 


Bear Mountain Bridge 


47 


1.00 


Newburgh-Beacon Bridge 


67 


1.00 


Mid-Hudson Bridge 


82 


1.00 


Kingston- Rhinecliff Bridge 


102 


1.00 


Rip Van Winkle Bridge 


120 


1.00 



Find the line of best fit and graph the data to show that the driver was practicing 
on my credulity. 

7 When the space shuttle Challenger exploded in 1986, one of the criticisms made 
of NASA's decision to launch was in the way they did the analysis of number of 
O-ring failures versus temperature (O-ring failure caused the explosion). Four 
O-ring failures would be fatal. NASA had data from 24 previous flights. 



temp °F 


53 75 57 58 63 


70 70 66 


67 


67 


67 


failures 


3 2 111 


110 












68 69 70 70 72 73 75 76 76 78 79 80 81 



0000000000000 
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The temperature that day was forecast to be 31 °F. 

(a) NASA based the decision to launch partially on a chart showing only the 
flights that had at least one O-ring failure. Find the line that best fits these 
seven flights. On the basis of this data, predict the number of O-ring failures 
when the temperature is 31, and when the number of failures will exceed four. 

(b) Find the line that best fits all 24 flights. On the basis of this extra data, 
predict the number of O-ring failures when the temperature is 31 , and when the 
number of failures will exceed four. 

Which do you think is the more accurate method of predicting? (An excellent 
discussion is in [Dalai, et. al.].) 
8 This table lists the average distance from the sun to each of the first seven planets, 
using Earth's average as a unit. 

Mercury Venus Earth Mars Jupiter Saturn Uranus 
039 072 LOO L52 5^20 954 19^2 

(a) Plot the number of the planet (Mercury is 1, etc.) versus the distance. Note 
that it does not look like a line, and so finding the line of best fit is not fruitful. 

(b) It does, however look like an exponential curve. Therefore, plot the number 
of the planet versus the logarithm of the distance. Does this look like a line? 

(c) The asteroid belt between Mars and Jupiter is what is left of a planet that 
broke apart. Renumber so that Jupiter is 6, Saturn is 7, and Uranus is 8, and 
plot against the log again. Does this look better? 

(d) Use least squares on that data to predict the location of Neptune. 

(e) Repeat to predict where Pluto is. 

(f) Is the formula accurate for Neptune and Pluto? 

This method was used to help discover Neptune (although the second item is 
misleading about the history; actually, the discovery of Neptune in position 9 
prompted people to look for the "missing planet" in position 5). See [Gardner, 1970] 



Geometry of Linear Maps 



The pictures below contrast the nonlinear maps fi (x) — e" and filx) — y? with 
the linear maps hi (x) — 2x and h.2(x) — — x. Each shows the domain on the 
left mapped to the codomain M} on the right. Arrows trace where each map 
sends x = 0, x = 1 , x = 2, x = — 1 , and x = —2. 

Note how the nonlinear maps distort the domain in transforming it into the 
range. For instance, f i (1 ) is further from fi (2) than it is from f i (0) — the map 
spreads the domain out unevenly so that in moving from domain to range an 
interval near x — 2 spreads apart more than is an interval near x = 0. 




The linear maps are nicer, more regular, in that for each map all of the domain 
spreads by the same factor. 



6- -6 5- -5 




-5- -5 -5- -5 
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The only linear maps from to K' are multiplications by a scalar but in 
higher dimensions more can happen. For instance, this linear transformation of 
rotates vectors counterclockwise. 



x\ /x COS G — Tj sin 6\ 
yj yx sin 9 + y cos 6 J 



The transformation of 
simply a rescaUng. 



that projects vectors into the xz-plane is also not 





Nonetheless, even in higher dimensions the linear maps behave nicely. Con- 
sider a linear map h: —J- M"^ We will use the standard bases to represent it 
by a matrix H. Recall that any such H factors as H = PBQ, where P and Q 
are nonsingular and B is a partial-identity matrix. Recall also that nonsingular 
matrices factor into elementary matrices PBQ — T^Tn-i • • • TjBTj_i • • • Ti , which 
are matrices that come from the identity I after one Gaussian step 



I ^ Mdk] 



I Pi, 



I 



for i 7^ j, k 7^ 0. So if we understand the effect of a linear map described 
by a partial-identity matrix and the effect of the linear maps described by the 
elementary matrices then we will in some sense understand the effect of any linear 
map. (To understand them we mean to give a description of their geometric 
effect; the pictures below stick to transformations of for ease of drawing but 
the principles extend for maps from any to any M"^.) 

The geometric effect of the linear transformation represented by a partial- 
identity matrix is projection. 



The geometric effect of the M.i(k) matrices is to stretch vectors by a factor 
of k along the i-th axis. This map stretches by a factor of 3 along the x-axis. 
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If ^ k < 1 or if k < then the i-th component goes the other way, here to the 
left. 



-2x 
V 



Either of these is a dilation. 

A transformation represented by a Ptj matrix interchanges the i-th and j-th 
axes. This is reflection about the line xi — Xj. 





Permutations involving more than two axes decompose into a combination of 
swaps of pairs of axes; see Exercise 5. 

The remaining matrices have the form Cij(k). For instance Ci^2(2) performs 
2pi + 92- 



£2, £2, 




In the picture below, the vector u with the first component of 1 is affected less 
than the vector v with the first component of 2 — Hfu) is only 2 higher than u 
while h.(v) is 4 higher than v. 




Any vector with a first component of 1 would be affected in the same way as 
u; it would slide up by 2. And any vector with a first component of 2 would 
slide up 4, as was v. That is, the transformation represented by Cij (k) affects 
vectors depending on their i-th component. 

Another way to see this point is to consider the action of this map on the unit 
square. In the next picture, vectors with a first component of 0, like the origin, 
are not pushed vertically at all but vectors with a positive first component slide 
up. Here, all vectors with a first component of 1, the entire right side of the 
square, slide to the same extent. In general, vectors on the same vertical line 
slide by the same amount, by twice their first component. The shape of the 
result, a rhombus, has the same base and height as the square (and thus the 
same area) but the right angle corners are gone. 
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For contrast the next picture shows the effect of the map represented by 



This kind of map is a skew. 

With that, we understand the geometric effect of the four types of components 
in the expansion H = TnTn-^ ■ ■ ■ TjBTj_i • • • Ti , and so, in some sense, we have 
an understanding of the action of any matrix H. 

We will illustrate the usefulness of our understanding in two ways. The 
first is that we will use it to prove something about linear maps. Recall that 
under a linear map, the image of a subspace is a subspace and thus the linear 
transformation h represented by H maps lines through the origin to lines through 
the origin. (The dimension of the image space cannot be greater than the 
dimension of the domain space, so a line can't map onto, say, a plane.) We will 
show that h maps any line, not just one through the origin, to a line. The proof 
is simple: the partial-identity projection B and the elementary T^'s each turn a 
line input into a line output (verifying the four cases is Exercise 6). Therefore 
their composition also preserves lines. 

The second way that we will illustrate the usefulness of our understanding 
is to apply it to Calculus. Below is a picture of the action of the one- variable 
real function y (x) —x-^+x. As we noted at that start of this Topic, overall the 
geometric effect of this map is irregular in that at different domain points it 
has different effects; for example as the domain point x goes from 2 to —2, the 
associated range point f (x) at first decreases, then pauses instantaneously, and 
then increases. 



But in Calculus we focus less on the map overall, and more on the local effect 
of the map. The picture below looks closely at what this map does near x = 1 . 
The derivative is dy/dx = 2x + 1 so that near x = 1 we have Ay « 3 • Ax. That 
is, in a neighborhood of x = 1 , in carrying the domain to the codomain this map 
causes it to grow by a factor of 3 — it is, locally, approximately, a dilation. The 
picture below shows a small interval in the domain (1 — Ax.. 1 + Ax) carried 



C2,i(1). Here vectors are affected according to their second component: (^) 
slides horizontally by twice y. 
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over to an interval in the codomain [2 — Ay .. 2 + Ay] that is three times as wide 
Ay w 3 • Ax. 



In higher dimensions the idea is the same but more can happen than in the 
-to-R^ scalar multiplication case. For a function y : and a point 

X € K^, the derivative is defined to be the linear map h: K"^ R™- that best 
approximates how y changes near y(x]. So the geometry studied above directly 
applies to the derivatives. 

We will close this Topic by remarking how this point of view makes clear 
an often misunderstood but very important result about derivatives, the Chain 
Rule. Recall that, with suitable conditions on the two functions, the derivative 
of the composition is this. 

d(g°f), , dg df 
^^^''^ = d^(^^''»-d^^''^ 

For instance the derivative of sin(x^ + 3x) is cos(x^ + 3x) ■ (2x + 3). 
Where does this come from? Consider the f, g: — ^ picture. 



9(f (X)) 



The first map f dilates the neighborhood of x by a factor of 

df 

dx 

and the second map g follows that by dilating a neighborhood of f (x] by a factor 
of 

dg, 



dx 



(f(x)) 



and when combined, the composition dilates by the product of the two. In 
higher dimensions the map expressing how a function changes near a point is 
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a linear map, and is represented by a matrix. The Chain Rule multiplies the 
matrices. 

Thus, the geometry of linear maps h: M.^ R™- is appealing both for its 
simplicity and for its usefulness. 

Exercises 

1 Let h: be the transformation that rotates vectors clockwise by n/4 radi- 
ans. 

(a) Find the matrix H representing h with respect to the standard bases. Use 
Gauss's Method to reduce H to the identity. 

(b) Translate the row reduction to a matrix equation TjTj_] ■ ■ • T] H = I (the prior 
item shows both that H is similar to I, and that we need no column operations 
to derive I from H). 

(c) Solve this matrix equation for H. 

(d) Sketch how H is a combination of dilations, flips, skews, and projections (the 
identity is a trivial projection). 

2 What combination of dilations, flips, skews, and projections produces a rotation 
counterclockwise by 27t/3 radians? 

3 What combination of dilations, flips, skews, and projections produces the map 
h: — > M.^ represented with respect to the standard bases by this matrix? 

/I 2 n 

3 6 
\1 2 2/ 

4 Show that any linear transformation of R' is the map that multiplies by a scalar 
X kx. 

5 Show that for any permutation (that is, reordering) p of the numbers 1 , . . . , n, 
the map 



can be done with a composition of maps, each of which only swaps a single pair of 
coordinates. Hint: you can use induction on n. {Remark: in the fourth chapter we 
will show this and we will also show that the parity of the number of swaps used is 
determined by p. That is, although a particular permutation could be expressed in 
two different ways with two different numbers of swaps, either both ways use an 
even number of swaps, or both use an odd number.) 

6 Show that linear maps preserve the linear structures of a space. 

(a) Show that for any linear map from R'^ to R"^, the image of any line is a line. 
The image may be a degenerate line, that is, a single point. 

(b) Show that the image of any linear surface is a linear surface. This generalizes 
the result that under a linear map the image of a subspace is a subspace. 

(c) Linear maps preserve other linear ideas. Show that linear maps preserve 
"betweeness": if the point B is between A and C then the image of B is between 
the image of A and the image of C. 

7 Use a picture like the one that appears in the discussion of the Chain Rule to 
answer: if a function f : R ^ R has an inverse, what's the relationship between how 
the function — locally, approximately — dilates space, and how its inverse dilates 
space (assuming, of course, that it has an inverse)? 



Magic Squares 



A Chinese legend tells the story of a flood by the Lo river. The people offered 
sacrifices to appease the river. But each time a turtle emerged, walked around the 
sacrifice, and returned to the water. Puh-Hi, the founder of Chinese civilization, 
interpreted this to mean that the river was still annoyed. Fortunately, a child 
noticed that on its shell the turtle had the pattern on the left below, which is 
today called Lo Shu ("river scroll"). 

The dots make the matrix on the right where the rows, columns, and diagonals 
add to 15. Now that the people knew how much to sacrifice, the river's anger 
cooled. 

A square matrix is magic if each row, column, and diagonal add to the same 
value, the matrix's magic number. 

Another example of a magic square appears in the engraving Melencolia I 
by Albrecht Diirer. 



4 


9 


2 


3 


5 


7 


8 


1 


6 
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One interpretation is that it depicts melancholy, a depressed state. The figure, 
genius, has a wealth of fascinating things to explore including the compass, the 
geometrical solid, the scale, and the hourglass. But the figure is unmoved; all of 
these things lie unused. One of the potential delights, in the upper right, is a 
4x4 matrix whose rows, columns, and diagonals add to 34. 




16 


3 


2 


13 


5 


10 


n 


8 


9 


6 


7 


12 


4 


15 


14 


1 



The middle entries on the bottom row give 1514, the date of the engraving. 

The above two squares are arrangements of 1 . . . n^. They are normal. The 
1 X 1 square whose sole entry is 1 is normal, there is no normal 2x2 magic 
square by Exercise 2, and there are normal magic squares of every other size; see 
[Wikipedia Magic Square] . Finding the number of normal magic squares of each 
size is an unsolved problem; see [Online Encyclopedia of Integer Sequences]. 

If we don't require that the squares be normal then we can say much more. 
Every 1 x 1 square is magic, trivially. If the rows, columns, and diagonals of a 
2x2 matrix 

(;:) 

add to s then a + b = s,c + d = s, a + c = s,b + d = s, a+d = s, and b + c — s. 
Exercise 2 shows that this system has the unique solution a = b = c = d = s/2. 
So the set of 2 x 2 magic squares is a one-dimensional subspace of Mixi • 

In general, a sum of two same-sized magic squares is magic and a scalar 
multiple of a magic square is magic so the set of n x n magic squares is 
a vector space, a subspace of Mnxn- This Topic shows that for n ^ 3 the 
dimension of is tl^ — tl. The set .^ri,o of nxn magic squares with magic 
number is another subspace, and we will find the formula for its dimension 
also: — 2n — 1 when n ^ 3. 

We will first prove that dim^n = dim./#ri,o + 1- Define the trace of a 
matrix to be the sum down its upper-left to lower-right diagonal Tr(M) — 
mi^i + • • • + Tan,n- Consider the restriction of the trace to the magic squares 
Tr: ]R- The null space o/K(Tr) is the set of magic squares with magic 

number zero ^n,o- Observe that the trace is onto because for any r in the 
CO domain M the nxn matrix whose entries are all r/n is a magic square with 
magic number r. Theorem Two. II. 2. 15 says that for any linear map the dimension 
of the domain equals the dimension of the range space plus the dimension of the 
null space, the map's rank plus its nullity. Here the domain is ./#n, the range 
space is M and the null space is ^n,o, so we have that dim^n — 1 +dim./#n,o- 

We will finish by showing that dim^ri,o = n^ — 2n — 1 for n ^ 3. (For 
n = 1 the dimension is clearly 0. Exercise 3 shows it is also for n = 2.) If the 
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rows, columns, and diagonals of a matrix 




add to zero then we have an (2n + 2) x n-^ linear system. 

a+b+c =0 
d+e+f =0 
g + h. + i = 
a + d + g =0 

b +e +H =0 

c +f +i=0 

a +e +i=0 

c +e +g =0 

The matrix of coefficients for the particular cases of n = 3 and n — 4 are 
below, with the rows and columns numbered to help in reading the proof. With 
respect to the standard basis, each represents a linear map h: M.^^ — )• 
The domain has dimension so if we show that the rank of the matrix is 2n + 1 
then we will have what we want, that the dimension of the null space .^n,o is 

123 456 789 
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We want to show that the rank of the matrix of coefficients, the number of 
rows in a maximal linearly independent set, is 2n + 1 . The first n rows of the 
matrix of coefficients add to the same vector as the second n rows, the vector of 
all ones. So a maximal linearly independent must omit at least one row. We will 
show that the set of all rows but the first {p2 • . . Pin+i} is linearly independent. 
So consider this linear relationship. 

C2P2 H H C2nP2n + C2n+1 P2n+1 + C2ti+2P2ti+2 = (*) 

Now it gets messy. In the final two rows, in the first n columns, is a subrow 
that is all zeros except that it starts with a one in column 1 and a subrow 
that is all zeros except that it ends with a one in column n. With pi not in 
(*), each of those columns contains only two ones and so we can conclude that 

C2n+1 = — Cn+l aS Well aS that C2n+2 — —Cin- 

Next consider the columns between those two — in the n — 3 illustration 
above this includes only the second column while in the n = 4 matrix it includes 
both the second and third columns. Each such column has a single one. That is, 
for each column index j e {2 . . . n — 2} the column consists of only zeros except 
for a one in row n + j, and hence Cn+j = 0. 

On to the next block of columns, from n + 1 through 2n. Column n + 1 has 
only two ones (because n ^ 3 the ones in the last two rows do not fall in the first 
column of this block). Thus Cz — — Cn+i and therefore Cz — C2n+i- Likewise, 
from column 2n we conclude that Cz = —Czn and so C2 = C2n.+2 • 

Because n ^ 3 there is at least one column between column n + 1 and 
column 2n — 1 . In at least one of those columns a one appears in P2n+i • If a 
one also appears in that column in P2n+2 then we have C2 = — (C2n+i + C2n+2) 
(recall that Cn+j — for j S {2 . . . n — 2}) . If a one does not appear in that 
column in P2ti+2 then we have Cz — — C2n+i • In either case Cz — 0, and thus 

C2n+1 = C2n+2 = and Cn+l = CZn = 0. 

If the next block of n-many columns is not the last then similarly conclude 
from its first column that C3 = Cn+i = 0. 

Keep this up until we reach the last block of columns, those numbered 
(n — 1 )rL + 1 through n^. Because Cn+i = • • • = Czn — column gives that 

Cn = -C2n+1 = 0. 

Therefore the rank of the matrix is 2n + 1 , as required. 

The classic source on normal magic squares is [Ball & Coxeter]. More on 
the Lo Shu square is at [Wikipedia Lo Shu Square] . The proof given here began 
with [Ward]. 

Exercises 

1 Let M be a 3x3 magic square with magic number s. 

(a) Prove that the sum of M's entries is 3s. 

(b) Prove that s = 3 • mi, 2 ■ 

(c) Prove that ra2,2 is the average of the entries in its row, its column, and in 
each diagonal. 
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(d) Prove that is the median of M's entries. 

2 Solve the system a + b = s, c + d = s, a + c = s, b + d = s, a+d = s, and b + c = s. 

3 Show that dim .,#2,0 = 0- 

4 Let the trace function be Tr(M] = mi,i + • • • + mn,n- Define also the sum down 
the other diagonal Tr*(M) = mi,n + ■ • ■ + TTin,i • 

(a) Show that the two functions Tr, Tr* : M^xn K are linear. 

(b) Show that the function 6: Mnxn ^ given by e(M) = {Tr{M), Tr*(m]) is 
linear. 

(c) Generalize the prior item. 

5 A square matrix is semimagic if the rows and columns add to the same value, 
that is, if we drop the condition on the diagonals. 

(a) Show that the set of semimagic squares -j^^ is a subspace of 3VC^xn ■ 

(b) Show that the set ,y^rt,o of nxn semimagic squares with magic number is 
also a subspace of Mnxn- 

6 [Beardon] Here is a slicker proof of the result of this Topic, when n ^ 3. See the 
prior two exercises for some definitions and needed results. 

(a) First show that dim..#n,o = dim,i^_o +2- To do this, consider the function 
6: Mn sending a matrix M to the ordered pair (Tr(M), Tr* (M)). Specifi- 
cally, consider the restriction of that map 9: to the semimagic squares. 
Clearly its null space is ./^n.o- Show that when n ^ 3 this restriction 9 is onto. 
{Hint: we need only find a basis for that is the image of two members of JC^) 

(b) Let the function cf): M^xn ^{n-^]x(n-^] be the identity map except that 
it drops the final row and column: 4){M) — M where ifiij — for all 
i, j e {1 . . . n — 1}. The check that 4> is linear is easy. Consider 4)'s restriction to 
the semimagic squares with magic number zero cj): .yifn.o ^ln-^)x^-n-^]■ Show 
that 4) is one-to-one 

(c) Show that cf) is onto. 

(d) Conclude that M'n,o has dimension (n — 1)^. 

(e) Conclude that dim(^n) = — n 



Markov Chains 



Here is a simple game: a player bets on coin tosses, a dollar each time, and the 
game ends either when the player has no money or is up to five dollars. If the 
player starts with three dollars, what is the chance that the game takes at least 
five flips? Twenty-five flips? 

At any point, this player has either $0, or $1, . . . , or $5. We say that the 
player is in the state so, si , . . . , or S5. In the game the player moves from state 
to state. For instance, a player now in state S3 has on the next flip a 0.5 chance 
of moving to state si and a 0.5 chance of moving to S4. The boundary states 
are a bit different; a player never leaves state so or state S5. 

Let pi(n) be the probability that the player is in state st after n flips. Then, 
for instance, we have that the probability of being in state so after flip n + 1 is 
po(n. + 1 ] = Po(ti-) + 0-5 • pi (tl). This matrix equation summarizes. 



/1.0 
0.0 
0.0 
0.0 
0.0 

\0.0 



0.5 
0.0 
0.5 
0.0 
0.0 
0.0 



0.0 
0.5 
0.0 
0.5 
0.0 
0.0 



0.0 
0.0 
0.5 
0.0 
0.5 
0.0 



0.0 
0.0 
0.0 
0.5 
0.0 
0.5 



0.0\ 

0.0 

0.0 

0.0 

0.0 

1.0/ 



/po(ti)\ 

Pi (n- 

P2(n- 

P3(n- 

P4(n 
VpsW/ 



/po(n- + 
pi(n + 
P2(rL + 
P3(n + 
P4(n + 

Vp5(n- + 



With the initial condition that the player starts with three dollars, these are 
components of the resulting vectors. 



rL = 


n= 1 


n = 2 


n = 3 


n =4 


n = 24 











0.125 


0.125 


0.39600 








0.25 





0.1875 


0.00276 





0.5 





0.375 








1 





0.5 





0.3125 


0.00447 





0.5 





0.25 














0.25 


0.25 


0.375 





This exploration suggests that the game is not likely to go on for long, with 
the player quickly moving to an ending state. For instance, after the fourth flip 
there is a 0.50 probability that the game is already over. 
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This is a Markov chain, named for A. A. Markov, who worked in the first half 
of the 1900's. Each vector is a probability vector, whose entries are nonnegative 
real numbers that sum to 1 . The matrix is a transition matrix or stochastic 
matrix, whose entries are nonnegative reals and whose columns sum to 1 . 

A characteristic feature of a Markov chain model is that it is historyless in 
that the next state depends only on the current state, not on any prior ones. 
Thus, a player who arrives at S2 by starting in state S3 and then going to state si 
has exactly the same chance of moving next to S3 as does a player whose history 
was to start in S3 then go to S4 then to S3 and then to sz- 

Here is a Markov chain from sociology. A study ([Macdonald & Ridge], 
p. 202) divided occupations in the United Kingdom into three levels: executives 
and professionals, supervisors and skilled manual workers, and unskilled workers. 
They asked about two thousand men, "At what level are you, and at what level 
was your father when you were fourteen years old?" Here the Markov model 
assumption about history may seem reasonable — we may guess that while a 
parent's occupation has a direct influence on the occupation of the child, the 
grandparent's occupation likely has no such direct influence. This summarizes 
the study's conclusions. 



f.60 


.29 






.26 


.37 


1 


PM(n 


[a4 


.34 




\PL(n) 



pu(n+ 1) 
PM(n+ V 
PL(n + 1) 



For instance, looking at the middle class for the next generation, a child of an 
upper class worker has a 0.26 probability of becoming middle class, a child of 
a middle class worker has a 0.37 chance of being middle class, and a child of a 
lower class worker has a 0.27 probability of becoming middle class. With the 
initial distribution of the respondent's fathers given below, this table gives the 
next five generations. 



n = 


n= 1 


n = 2 


n = 3 


n = 4 


n = 5 


.12 


.23 


.29 


.31 


.32 


.33 


.32 


.34 


.34 


.34 


.33 


.33 


.56 


.42 


.37 


.35 


.34 


.34 



One more example. In professional American baseball there are two leagues, 
the American League and the National League. At the end of the annual season 
the team winning the American League and the team winning the National 
League play the World Series. The winner is the first team to take four games. 
That means that a series is in one of twenty- four states: 0-0 (no games won 
yet by either team), 1-0 (one game won for the American League team and no 
games for the National League team), etc. 

Consider a series with a probability p that the American League team wins 
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each game. We have this. 



/ 











p 











1-p 














p 











1-p 


p 











1-p 






V 



/Po-o(T^)^^ 
Pi-o(n) 
Po-i(n) 
P2-o(ti) 
Pi-i(tv) 
Po-2(tv) 



V 



/Po-o(ti- 
Pi-o(n- 
Po-i(n- 
P2-o(ti 
Pi-i(ti 
Po-2(ti 

V 



An especially interesting special case is when the teams are evenly matched, 
p — 0.50. This table below lists the resulting components of the n = through 
n — 7 vectors. (The code to generate this table in the computer algebra system 
Octave follows the exercises.) 

Note that evenly-matched teams are likely to have a long series — there is a 
probability of 0.625 that the series goes at least six games. 





TL = 


n = 1 


TL = 2 


n = 3 


n = 4 


n = 5 


TL = 6 


n = 7 
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-0 





0.5 























- 1 





0.5 




















2 


-0 








0.25 

















1 


- 1 








0.5 




















-2 








0.25 

















3 


-0 











0.125 














2 


- 1 











0.375 














1 


-2 











0.375 

















-3 











0.125 














4 


-0 














0.0625 


0.0625 


0.0625 


0.0625 
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- 1 














0.25 
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-2 














0.375 
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-3 














0.25 














-4 














0.0625 


0.0625 


0.0625 


0.0625 
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- 1 

















0.125 


0.125 


0.125 


3 


-2 

















0.3125 
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-3 

















0.3125 
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-4 

















0.125 


0.125 


0.125 


4 


-2 




















0.15625 


0.15625 


3 


-3 




















0.3125 
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-4 




















0.15625 


0.15625 


4 


-3 























0.15625 


3 


-4 























0.15625 



Markov chains are a widely- used applications of matrix operations. They 
also give us an example of the use of matrices where we do not consider the 
significance of the maps represented by the matrices. For more on Markov chains, 
there are many sources such as [Kemeny & Snell] and [losifescu]. 
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Exercises 

Use a computer for these problems. You can, for instance, adapt the Octave 
script given below. 

1 These questions refer to the coin-flipping game. 

(a) Check the computations in the table at the end of the flrst paragraph. 

(b) Consider the second row of the vector table. Note that this row has alternating 
O's. Must pi(j) be when j is odd? Prove that it must be, or produce a 
counterexample. 

(c) Perform a computational experiment to estimate the chance that the player 
ends at five dollars, starting with one dollar, two dollars, and four dollars. 

2 [Feller] We consider throws of a die, and say the system is in state Si if the largest 
number yet appearing on the die was i. 

(a) Give the transition matrix. 

(b) Start the system in state S] , and run it for five throws. What is the vector at 
the end? 

3 [Kelton] There has been much interest in whether industries in the United States 
are moving from the Northeast and North Central regions to the South and West, 
motivated by the warmer climate, by lower wages, and by less unionization. Here 
is the transition matrix for large firms in Electric and Electronic Equipment. 





NE 


NC 


S 


W 


Z 


NE 




/0.787\ 




(0 \ 




(0 \ 




/o.m\ 




/0.102\ 




NC 









0.966 




0.034 














S 









0.063 




0.937 














W 














0.074 




0.612 




0.314 




Z 




y0.021 ) 




i^O.009^ 




y0.005 ) 




i^o.oio^ 




y0.954y 





For example, a firm in the Northeast region will be in the West region next year 
with probability 0.111. (The Z entry is a "birth-death" state. For instance, with 
probability 0.102 a large Electric and Electronic Equipment firm from the Northeast 
will move out of this system next year: go out of business, move abroad, or move to 
another category of firm. There is a 0.021 probability that a firm in the National 
Census of Manufacturers will move into Electronics, or be created, or move in 
from abroad, into the Northeast. Finally, with probability 0.954 a firm out of the 
categories will stay out, according to this research.) 

(a) Does the Markov model assumption of lack of history seem justified? 

(b) Assume that the initial distribution is even, except that the value at Z is 0.9. 
Compute the vectors for n = 1 through n = 4. 

(c) Suppose that the initial distribution is this. 

NE NC S W Z 

0.0000 0.6522 0.3478 0.0000 0.0000 
Calculate the distributions for n = 1 through n = 4. 

(d) Find the distribution for n = 50 and n = 51 . Has the system settled down to 
an equilibrium? 

4 [Wickens] Here is a model of some kinds of learning The learner starts in an 
undecided state su- Eventually the learner has to decide to do either response A 
(that is, end in state Sa) or response B (ending in Sb). However, the learner doesn't 
jump right from undecided to sure that A is the correct thing to do (or B). Instead, 
the learner spends some time in a "tentative-A" state, or a "tentative-B" state, 
trying the response out (denoted here tA and ta). Imagine that once the learner has 
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decided, it is final, so once in Sa or Sb, the learner stays there. For the other state 
changes, we can posit transitions with probability p in either direction. 

(a) Construct the transition matrix. 

(b) Take p = 0.25 and take the initial vector to be 1 at Su- Run this for five steps. 
What is the chance of ending up at sa? 

(c) Do the same for p = 0.20. 

(d) Graph p versus the chance of ending at Sa- Is there a threshold value for p, 
above which the learner is almost sure not to take longer than five steps? 

5 A certain town is in a certain country (this is a hypothetical problem). Each year 
ten percent of the town dwellers move to other parts of the country. Bach year one 
percent of the people from elsewhere move to the town. Assume that there are two 
states sj, living in town, and sc, living elsewhere. 

(a) Construct the transition matrix. 

(b) Starting with an initial distribution Sj = 0.3 and Sc = 0.7, get the results for 
the first ten years. 

(c) Do the same for Sj = 0.2. 

(d) Are the two outcomes alike or different? 

6 For the World Series application, use a computer to generate the seven vectors for 
p = 0.55 and p = 0.6. 

(a) What is the chance of the National League team winning it all, even though 
they have only a probability of 0.45 or 0.40 of winning any one game? 

(b) Graph the probability p against the chance that the American League team 
wins it all. Is there a threshold value — a p above which the better team is 
essentially ensured of winning? 

7 Above we define a transition matrix to have each entry nonnegative and each 
column sum to 1 . 

(a) Check that the three transition matrices shown in this Topic meet these two 
conditions. Must any transition matrix do so? 

(b) Observe that if Avq ~ v^ and Av] = V2 then A^ is a transition matrix from 
Vo to V2. Show that a power of a transition matrix is also a transition matrix. 

(c) Generalize the prior item by proving that the product of two appropriately- 
sized transition matrices is a transition matrix. 



Computer Code 

This script markov.m for the computer algebra system Octave generated the 
table of World Series outcomes. (The hash character # marks the rest of a line 
as a comment.) 



# Octave script file to 
function w = markov(p,v^ 

q = 1-p; 

A=[o, 0,0, 0,0,0, 0,0,0 
p, 0,0, 0,0,0, 0,0,0 

q, 0,0, 0,0,0, 0,0,0 

0,p,0,0,0,0, 0,0,0 

0,q,p, 0,0,0, 0,0,0 

0,0,q, 0,0,0, 0,0,0 



compute chance of World 



0,0,0,p,0,0, 
0,0,0,q,p,0, 
0,0,0,0,q,p, 
0,0,0,0,0,q, 
0,0,0,0,0,0, 
0,0,0,0,0,0, 
0,0,0,0,0,0, 



0,0,0 
0,0,0 
0,0,0 
0,0,0 
P,0,0 
q,p,0 
0,q,p 
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1- 3 

0- 4_ 
4-1 

3- 2 

2- 3_ 

1- 4 

4- 2 

3- 3_ 

2- 4 

4- 3 

3- 4 
w = A ■■■ V] 

endfunction 

Then the Octave session was this. 

>vO=[l;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0] 

> p=.5 

> vl=markov(p , vO) 

> v2=markov(p,vl) 



0,0,0, 
0,0,0, 
0,0,0, 
0,0,0, 
0,0,0, 
0,0,0, 
0,0,0, 
0,0,0, 
0,0,0, 
0,0,0, 
0,0,0, 



0,0,0, 
0,0,0, 
0,0,0, 
0,0,0, 
0,0,0, 
0,0,0, 
0,0,0, 
0,0,0, 
0,0,0, 
0,0,0, 
0,0,0, 



0,0,q,p,0,0, 
0,0,0,q,0,0, 
0,0,0,0,0,p, 
0,0,0,0,0,q, 
0,0,0,0,0,0, 
0,0,0,0,0,0, 
0,0,0,0,0,0, 
0,0,0,0,0,0, 
0,0,0,0,0,0, 
0,0,0,0,0,0, 
0,0,0,0,0,0, 



0,0,0,0,0,0, 
0,0,1,0,0,0, 
0,0,0,1,0,0, 
p, 0,0, 0,0,0, 
q,p,0,0,0,0. 



0,0,0,0,0,0 
0,0,0,0,0,0 
0,0,0,0,0,0 

0,0,0,0,0,0 

0,0,0,0,0,0 



0,q,0,0,0,0, 1,0,0,0,0,0 

o,o,o,o,p,o, 0,1,0,0,0,0 



0,0,0,0,q,p, 
0,0,0,0,0,q, 
0,0,0,0,0,0, 
0,0,0,0,0,0, 



0,0,0,0,0,0 
0,0,0,1,0,0 

0,0,p, 0,1,0 
0,0,q, 0,0,1] 



Translating to another computer algebra system should be easy — all have 
commands similar to these. 




Orthonormal Matrices 



In The Elements, Euclid considers two figures to be the same if they have 
the same size and shape. That is, while the triangles below are not equal 
because they are not the same set of points, they are congruent — essentially 
indistinguishable for Euclid's purposes — because we can imagine picking the 
plane up, sliding it over and rotating it a bit, although not warping or stretching 
it, and then putting it back down, to superimpose the first figure on the second. 
(Euclid never explicitly states this principle but he uses it often [Casey].) 



In modern terminology, "picking the plane up ... " is considering a map from 
the plane to itself. Euclid considers only transformations of the plane that may 
slide or turn the plane but not bend or stretch it. Accordingly, we define a map 
f : — > to be distance-preserving or a rigid motion or an isometry, if for 
all points Pi , P2 G M^, the distance from f (Pi ) to f(P2) equals the distance from 
Pi to P2. We also define a plane figure to be a set of points in the plane and we 
say that two figures are congruent if there is a distance-preserving map from 
the plane to itself that carries one figure onto the other. 

Many statements from Euclidean geometry follow easily from these definitions. 
Some are: (i) coUinearity is invariant under any distance-preserving map (that is, 
if Pi , P2, and P3 are coUinear then so are f{Pi ), f(P2), and f (P3)), (ii) betweeness 
is invariant under any distance-preserving map (if P2 is between Pi and P3 then 
so is f(P2) between f (Pi ) and f(P3)), (iii) the property of being a triangle is 
invariant under any distance-preserving map (if a figure is a triangle then the 
image of that figure is also a triangle), (iv) and the property of being a circle 
is invariant under any distance-preserving map. In 1872, F. Klein suggested 
that we can define Euclidean geometry as the study of properties that are 
invariant under these maps. (This forms part of Klein's Erlanger Program, 
which proposes the organizing principle that we can describe each kind of 
geometry — Euclidean, projective, etc. — as the study of the properties that are 
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invariant under some group of transformations. The word 'group' here means 
more than just 'collection', but that lies outside of our scope.) 

We can use linear algebra to characterize the distance-preserving maps of 
the plane. 

We must first observe that there are distance-preserving transformations of 
the plane that are not linear. The obvious example is this translation. 



However, this example turns out to be the only example, in the sense that if f is 
distance-preserving and sends to vq then the map v i-^- f (v) — vo is linear. That 

will follow immediately from this statement: a map t that is distance-preserving 
and sends to itself is linear. To prove this equivalent statement, let 

for some a, b, c, d € M. Then to show that t is linear we can show that it can be 
represented by a matrix, that is, that t acts in this way for all x, y € M. 

_ fx] t I ax + cy\ . , 

Recall that if we fix three non-coUinear points then we can determine any point 
by giving its distance from those three. So we can determine any point v in 
the domain by its distance from 0, ei, and ej. Similarly, we can determine 
any point t(v) in the codomain by its distance from the three fixed points t(0], 
t(ei), and ti^ez] (these three are not collinear because, as mentioned above, 
coUinearity is invariant and 0, ei, and ei are not collinear). In fact, because 
t is distance-preserving, we can say more: for the point v in the plane that is 
determined by being the distance do from 0, the distance di from ei , and the 
distamce di from ea, its image t(v) must be the unique point in the codomain 
that is determined by being do from t(0), di from t(ei), and di from t(e2). 
Because of the uniqueness, checking that the action in (*) works in the do, di , 
and dz cases 

dist(Q ,0) =dist(t(Q),t(0)) =dist((^^^^ + ^^) ,d) 
(we assumed that t maps to itself) 



and 



dist(r j ,62) =dist(t(r j),t(e2)] =dist(h^ + ^^ 
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suffices to show that (*) describes t. Those checks are routine. 

Thus we can write any distance-preserving 
for some constant vector vo and linear map t that is distance-preserving. So 
what is left in order to understand distance-preserving maps is to understand 

distance-preserving linear maps. 

Not every linear map is distance-preserving. For example v i— >^ 2v does not 
preserve distances. 

But there is a neat characterization: a linear transformation t of the plane 
is distance-preserving if and only if both ||t(ei)|| = ||t(e2)|| = 1, and t(ei) is 
orthogonal to t(e2). The 'only if half of that statement is easy — because t 
is distance-preserving it must preserve the lengths of vectors and because t 
is distance-preserving the Pythagorean theorem shows that it must preserve 
orthogonality. To show the 'if half we can check that the map preserves lengths 
of vectors because then for all p and q the distance between the two is preserved 
||t(p - q ) II = ||t(p] - 1(4 ) II = Hp - q II . For that check let 




and with the 'if assumptions that -I- = -I- = 1 and ac -I- bd = we 
have this. 

||t(v)||2 = (ax + cy)2 + (bx + dy)2 

= a^x^ + 2acxy -I- c^y^ -I- b^x^ -I- 2bdxy + d^y^ 
= x^{a^ + h^)+y^{c^ + d^) + 2xy(ac + bd) 
= x2+y2 



One thing that is neat about this characterization is that we can easily 
recognize matrices that represent such a map with respect to the stcindcird 
bases: the columns are of length one and are mutually orthogonal. This is an 
orthonormal matrix or orthogonal matrix (people often use the second term 
to mean not just that the columns are orthogonal but also that they have length 
one). 

We can use this to understand the geometric actions of distance-preserving 
maps. Because ||t(v)|| = ||v ||, the map t sends any v somewhere on the circle 
about the origin that has radius equal to the length of v. In particular, ei and ei 
map to the unit circle. What's more, once we fix the unit vector ei as mapped 
to the vector with components a and b then there are only two places where ei 
can go if its image is to be perpendicular to the first vector's image: it can map 
either to one where ei maintains its position a quarter circle clockwise from ei 
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b 



a 



Repe„£jt) = 



b 



a 



-b 



a 



or to one where it goes a quarter circle counterclockwise. 




) 



Rep£,,£,(t) 



b 



a 



—a 



b 




We can geometrically describe these two cases. Let 6 be the counterclockwise 
angle between the x-axis and the image of ei . The first matrix above represents, 
with respect to the standard bases, a rotation of the plane by 9 radians. 



The second matrix above represents a reflection of the plane through the line 
bisecting the angle between e^ and t{e^ ). 



(This picture shows ei refiected up into the first quadrant and 62 refiected down 
into the fourth quadrant.) 

Note: in the domain the angle between ei and ei runs counterclockwise, and 
in the first map above the angle from t(ei ) to t(e2) is also counterclockwise, 
so it preserves the orientation of the angle. But the second map reverses the 
orientation. A distance-preserving map is direct if it preserves orientations and 
opposite if it reverses orientation. 

So, we have characterized the Euclidean study of congruence. It considers, 
for plame figures, the properties that are invariant under combinations of (i) a 
rotation followed by a translation, or (ii) a reflection followed by a translation 
(a reflection followed by a non-trivial translation is a glide reflection). 

Another idea, besides congruence of figures, encountered in elementary 
geometry is that figures are similar if they are congruent after a change of scale. 
These two triangles are similar since the second is the same shape as the first, 
but 3/2-ths the size. 
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P2 




P3 



Prom the above work we have that figures are similar if there is an orthonormal 
matrix T such that the points q on one figure are the images of the points p on 
the other figure by q = (kT)v + po for some nonzero real number k and constant 
vector po. 

Although these ideas are from Euclid, mathematics is timeless and they are 
still in use today. One application of the maps studied above is in computer 
graphics. We can, for example, animate this top view of a cube by putting 
together film frames of it rotating; that's a rigid motion. 





Frame 1 Frame 2 Frame 3 

We could also make the cube appear to be moving away from us by producing 
film frames of it shrinking, which gives us figures that are similar. 



Frame 1: Frame 2: Frame 3: 



Computer graphics incorporates techniques from linear algebra in many other 
ways (see Exercise 4). 

A beautiful book that explores some of this area is [Weyl]. More on groups, 
of transformations and otherwise, is in any book on Modern Algebra, for instance 
[Birkhoff & MacLane]. More on Klein and the Erlanger Program is in [Yaglom]. 

Exercises 



(a) 
(b) 
(c) 



Decide if each of these is an orthonormal matrix. 

^/V2 -^/V2\ 

,-^/^/2 -^/V2) 

^/Vs -^/V3\ 

-1/^/3 -^/V3) 

^-V2/Vi -1/^/3; 

Write down the formula for each of these distance-preserving maps. 

(a) the map that rotates 7t/6 radians, and then translates by £2 

(b) the map that reflects about the line y = 2x 

(c) the map that reflects about y = — 2x and translates over 1 and up 1 

(a) The proof that a map that is distance-preserving and sends the zero vector 
to itself incidentally shows that such a map is one-to-one and onto (the point 
in the domain determined by do, d] , and d2 corresponds to the point in the 
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codomain determined by those three). Therefore any distance-preserving map 
has an inverse. Show that the inverse is also distance-preserving, 
(b) Prove that congruence is an equivalence relation between plane figures. 

4 In practice the matrix for the distance-preserving linear transformation and the 
translation are often combined into one. Check that these two computations yield 
the same first two components. 

(These are homogeneous coordinates; see the Topic on Projective Geometry). 

5 (a) Verify that the properties described in the second paragraph of this Topic as 
invariant under distance-preserving maps are indeed so. 

(b) Give two more properties that are of interest in Euclidean geometry from 
your experience in studying that subject that are also invariant under distance- 
preserving maps. 

(c) Give a property that is not of interest in Euclidean geometry and is not 
invariant under distance-preserving maps. 




Determinants 



In the first chapter we highhghted the special case of linear systems with the 
same number of equations as unknowns, those of the form Tx = b where T is 
a square matrix. We noted a distinction between two classes of T's. If T is 
associated with a unique solution for any vector b of constants, such as for the 
homogeneous system Tx — 0, then T is associated with a unique solution for 
every vector b. We call such a matrix of coefficients nonsingular. The other 
kind of T, where every linear system for which it is the matrix of coefficients has 
either no solution or infinitely many solutions, is singular. 

In our work since then the value of this distinction has been a theme. For 
instance, we now know that an nxn matrix T is nonsingular if and only if each 
of these holds: 

• any system Tx — b has a solution and that solution is unique; 

• Gauss- Jordan reduction of T yields an identity matrix; 

• the rows of T form a linearly independent set; 

• the columns of T form a linearly independent set, and a basis for M^; 

• any map that T represents is an isomorphism; 

• an inverse matrix T^^ exists. 

So when we look at a particular square matrix, one of the first things that we 
ask is whether it is nonsingular. 

Naturally there is a formula that determines whether T is nonsingular. This 
chapter develops that formula. More precisely, we will develop infinitely many 
formulas, one for 1 x 1 matrices, one for 2x2 matrices, etc. These formulas are 
related, that is, we will develop a family of formulas, a scheme that describes 
the formula for each size. 

Since we will restrict the discussion to square matrices, in this chapter we 
will often simply say 'matrix' in place of 'square matrix'. 
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I Definition 

Determining nonsingularity is trivial for 1 x 1 matrices. 

is nonsingular iff a ^ 



Corollary Three. IV. 4. 11 gives the formula for the inverse of a 2x2 matrix. 

is nonsingular iff ad — be 7^ 



a b' 
c d , 



We can produce the 3x3 formula as we did the prior one, although the compu- 
tation is intricate (see Exercise 9). 



is nonsingular iff aei + bf g + cdh. — hf a — idb — gee 7^ 



With these cases in mind, we posit a family of formulas: a, ad — be, etc. For 
each n the formula defines a determinant function detnxn : ^nxn ^ ^ such 
that an TL X TL matrix T is nonsingular if and only if detnxn 

(T) ^ 0. (We usually 

omit the subscript nxn because the size of T tells us which determinant function 
we mean.) 




1.1 Exploration 

This is an optional motivation of the general definition, suggesting how a 
person might develop that formula. The definition is in the next subsection. 

Above, in each case the matrix is nonsingular if and only if some formula is 
nonzero. But the three cases don't show an obvious pattern for the formula. We 
may spot that the 1 x 1 term a has one letter, that the 2x2 terms ad and be 
have two letters, and that the 3x3 terms aei, etc., each have three letters. We 
may also spot that in those terms there is a letter from each row and column of 
the matrix, e.g., in the edh term one letter comes from each row and from each 
column. 

/ c\ 
d 

V ^ / 

But these observations are perhaps more puzzling than enlightening. For instance, 
we might wonder why we add some terms but subtract others. 

A good problem solving strategy is to see what properties a solution must 
have and then search for something with those properties. So we shall start by 
asking what properties the formuleis must have. 
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At this point, our main way to decide whether a matrix is singular or 
nonsingular is to do Gaussian reduction and then check whether the diagonal 
of resulting echelon form matrix has any zeroes, that is, to check whether the 
product down the diagonal is zero. So we could guess that whatever formula 
we find, the proof that it is right may involve applying Gauss's Method to the 
matrix to show that in the end the product down the diagonal is zero if and 
only if our formula gives zero. 

This suggests a plan: we will look for a family of determinant formulas that 
aire unaffected by row operations amd such that the determinant of an echelon 
form matrix is the product of its diagonal entries. In the rest of this subsection 
we will test this plan against the 2x2 and 3x3 formulas. In the end we will 
have to modify the "unaffected by row operations" part, but not by much. 

The first step in testing this plan is to see whether the 2x2 and 3x3 formulas 
are unaffected by the row operation of combining: if 

J i<Pi+Pj J 

then is det(T) = det(T)? This check of the 2x2 determinant after the kpi + pi 
operation 




= a(kb + d) - (ka + c)b = ad - be 

shows that it is indeed unchanged, and the other 2x2 combination kpi + Pi 
gives the same result. The 3x3 combination kp3 + pz leaves the determinant 
unchanged 

/a b c \ 

) = a(kh+e)i + b(ki + f)g + c(kg + d)h 
- h(ki + f )a - i(kg + d]b - g(kh. + e)c 

= aei + bf g + cdh — hi a — idb — gee 



detf 



kg + d kh + e ki + f 
V 9 H ^ J 



as do the other 3x3 row combination operations. 

So there seems to be promise in the plan. Of course, perhaps if we had 
worked out the 4x4 determinant formula and tested it then we might have found 
that it is affected by row combinations. This is an exploration and we do not 
yet have all the facts. Nonetheless, so far, so good. 

Next is to compare det(T) with det(T) for row swaps. We now hit a snag: 
the 2x2 row swap pi o pi does not yield ad — be. 

= eb — ad 

And this pi O P3 swap inside of a 3x3 matrix 
/g H 

det( d e f I ) = gee + hfa + idb — bfg — edh— aei 




b c. 
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also does not give the same determinant as before the swap; again there is a sign 
change. Trying a different 3x3 swap pi o P2 





(d 


e 






det( 


a 


b 


c 


) = dbi + ecg + f ah. — hcd — iae — gbf 




u 


h. 


V 





also gives a change of sign. 

So row swaps seem to change the sign of a determinant formula. This does 
not wreck our plan entirely. We intend to decide nonsingularity by considering 
only whether the formula gives zero, not by considering its sign. Therefore, 
instead of expecting determinant formulas to be entirely unaffected by row 
operations, we modify our plan to have them to change sign on a swap. 

Obviously we will next compare det(T) with det(T) for the operation of 
multiplying a row by a scalar k. This 

detii ^ ^ I) = a(kd)-(kc)b = k- (ad-bc) 
\ kc kd / 

ends with the determinant multipUed by k, and the other 2x2 case has the same 
result. This 3x3 case ends the same way 

/a b c\ 
det( d e f ) = ae(ki) + bf (kg) + cd(kH) 
^^kg kh kiy — (kh)fa — (ki)db — (kg)ec 

= k • (aei + bf g + cdh. — hi a — idb — gee) 

and the other two are similar. These make us suspect that multiplying a row 
by k multiplies a determinant by k. As before, this modifies our plan but does 
not wreck it. We are asking only that the zero-ness of the determinant formula 
be unchanged and we are not focusing on the its sign or magnitude. 

So our plan from the start of this exploration is modified in some inessential 
ways and is now: we will look for determinant functions that remain unchanged 
under the operation of row combination, that change sign on a row swap, that 
rescale on the rescaling of a row, and such that the determinant of an echelon 
form matrix is the product down the diagonal. In the next two subsections we 
will see that for each n there is such an nxn determinant function, and that for 
each n there is only one such function. 

Finally, for the next subsection note that scalars factor out of a row without 
affecting other rows: here 





(3 


3 




/ 


det( 


2 


1 


i) 


) = 3-det( 






10 




v 




the 3 comes only out of the top row only, leaving the other rows unchanged. 
So in the definition of determinant we will write it as a function of the rows 
det(pi , P2, . . . Pn), not as det(T) or as a function of the entries det(ti,i , . . . ,tn,n)- 
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Exercises 

/ 1.1 Evaluate the determinant of each. 



3 1 
-1 1 



(b) 



















(■ 


1 


i) 


(c) 


(i 





-i) 













3 





2 
-1 3 




1.2 Evaluate the determinant of each. 

/2 1 P 

(b) 5 -2 

\1 -3 4J 

/ 1.3 Verify that the determinant of an upper-triangular 3x3 matrix is the product 
down the diagonal. 

/a b c\ 
det( e f ) = aei 
\0 ij 

Do lower-triangular matrices work the same way? 
/ 1.4 Use the determinant to decide if each is singular or nonsingular. 

(3 ;) c J) <=i G 

1.5 Singular or nonsingular? Use the determinant to decide. 

/2 in no n /2 i ox 

(a) 3 2 2 (b) 2 1 1 (c) 3 -2 
\0 14/ \4 1 3/ Vl 0/ 

/ 1.6 Each pair of matrices differ by one row operation. Use this operation to compare 
det(A) with det{B). 




3\ 



b c h = (b-Q)(c-Q)(c-b) 
b^ cV 

/ 1.8 Which real numbers x make this matrix singular? 

'12-x 4 



1.9 Do the Gaussian reduction to check the formula for 3x3 matrices stated in the 
preamble to this section. 

/q b c\ 

d e f is nonsingular iff aei + bf g + cdh — hf a — idb — gee / 

\g H iJ 

1.10 Show that the equation of a line in thru (xi,yi) and (X2,y2) is given by 
this determinant. 

/X y n 
det{ X, y, 1 ) = xi 7^ X2 
\X2 yi 1/ 
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/ 1.11 Many people know this mnemonic for the determinant of a 3x3 matrix: first 
repeat the first two columns and then sum the products on the forward diagonals 
and subtract the products on the backward diagonals. That is, first write 




and then calculate this. 

Hl,lH2,2h3_3 +hi,2h2,3h.3_, + H, ,3h2,1 h3,2 
-H3,1 h.2,2hl ,3 - h.3,2H2,3H, J - h,3,3h2,1 H, ,2 

(a) Check that this agrees with the formula given in the preamble to this section. 

(b) Does it extend to other-sized determinants? 
1.12 The cross product of the vectors 



V = 




is the vector computed as this determinant. 

X X y = det( X] 

\yi 

Note that the first row's entries are vectors, the vectors from the standard basis for 
R-^ . Show that the cross product of two vectors is perpendicular to each vector. 

1.13 Prove that each statement holds for 2x2 matrices. 

(a) The determinant of a product is the product of the determinants det(ST) — 
det(S) -detlT). 

(b) If T is invertible then the determinant of the inverse is the inverse of the 
determinant det{T-' ) = { det(T) )-i . 

Matrices T and T' are similar if there is a nonsingular matrix P such that T' — PTP^^ . 
(This definition is in Chapter Five.) Show that similar 2x2 matrices have the same 
determinant. 

/ 1.14 Prove that the area of this region in the plane 




is equal to the value of this determinant 

det 

Compare with this. 



detf 



X2 

y2 



X2 
V2 



1.15 Prove that for 2x2 matrices, the determinant of a matrix equals the determinant 
of its transpose. Does that also hold for 3x3 matrices? 
/ 1.16 Is the determinant function linear — is det(x-T + y • S) = x ■ det(T) +y ■ det(S)? 
1.17 Show that if A is 3x3 then det(c ■ A) = c-^ • det(A) for any scalar c. 
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1.18 Which real numbers 6 make 

/ cos 9 — sin Q\ 
v^sinB cos 6 / 

singular? Explain geometrically. 
? 1.19 [Am. Math. Mon., Apr. 1955] If a third order determinant has elements 1, 2, 
. . . , 9, what is the maximum value it may have? 



1.2 Properties of Determinants 

We want a formula to determine whether an nxn matrix is nonsingular. We 
will not begin by stating such a formula. Instead, we will begin by considering 
the function that such a formula calculates. We will define this function by 
its properties, then prove that the function with these properties exist and is 
unique, and also describe formulas that compute this function. (Because we will 
eventually show that the function exists and is unique, from the start we will 
say 'det(T)' instead of 'if there is a unique determinant function then det(T)'.) 

2.1 Definition A nxn determinant is a function det: M^xn ^ ^ such that 

(1) det(pi,.. .,k- Pi + pj,.. .,pn) = det(pi,. ..,pj,...,pn) for i 7^ j 

(2) det(pi,...,pj,...,pi,...,pn) =-det(pi,...,Pi,...,Pj,...,Pn) for i 7^ j 

(3) det(pi, . . . ,kpi,. . ., pn) = k • det(pi, . . . , Pi, . . . , p^) for any scalar k 

(4) det (I) — 1 where 1 is an identity matrix 

(the p 's are the rows of the matrix). We often write |T| for det(T). 

2.2 Remark Property (2) is redundant since 

J P i+P ) -Pj+Pi Pi+Pj -Pi ^ 

swaps rows i and j. We have listed it only for convenience. 

In Gauss's Method the operation of multiplying a row by a constant k had a 
restriction that k 7^ 0. Property (3) does not have the restriction because the 
next result shows that we do not need it here. 

The previous subsection's plan asks for the determinant of an echelon form 
matrix to be the product down the diagonal. The next result shows that in the 
presence of the other three conditions, property (4) gives that. 

2.3 Lemma A matrix with two identical rows has a determinant of zero. A matrix 
with a zero row has a determinant of zero. A matrix is nonsingular if and only 
if its determinant is nonzero. The determinant of an echelon form matrix is the 
product down its diagonal. 
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Proof To verify the first sentence, swap the two equal rows. The sign of the 
determinant changes but the matrix is the same and so its determinant is the 
same. Thus the determinant is zero. 

The second sentence follows from property (3). Multiply the zero row by 
two. That doubles the determinant but it also leaves the row unchanged and 
hence leaves the determinant unchanged. Thus the determinant must be zero. 

For the third sentence, where T T is the Gauss-Jordan reduction, 

by the definition the determinant of T is zero if and only if the determinant of T 
is zero (although the two could differ in sign or magnitude). A nonsingular T 
Gauss-Jordan reduces to an identity matrix and so has a nonzero determinant. 
A singular T reduces to a T with a zero row; by the second sentence of this 
lemma its determinant is zero. 

The fourth sentence has two cases. If the echelon form matrix is singular 
then it has a zero row. Thus it has a zero on its diagonal, so the product down 
its diagonal is zero. By the third sentence the determinant is zero and therefore 
this matrix's determinant equals the product down its diagonal. 

If the echelon form matrix is nonsingular then none of its diagonal entries is 
zero so we can use property (3) to get 1 's on the diagonal (again, the vertical 
bars I • • • I indicate the determinant operation). 



t2,2 







t2, 



■ tl,l • t2,2 • • • tn,n ■ 



tl,2/tl,1 
1 



tl,n/tl,l 
t2,n/t2,2 

1 



Then the Jordan half of Gauss- Jordan elimination, using property (1) of the 
definition, leaves the identity matrix. 



tl 1 • t2,2 ■ 



1 
1 





ti,i • t2,2 • ■■W,n ■ 1 



So in this case also, the determinant is the product down the diagonal. QED 

That gives us a way to compute the value of a determinant function on a 
matrix: do Gaussian reduction, keeping track of any changes of sign caused by 
row swaps and any scalars that we factor out, and finish by multiplying down 
the diagonal of the echelon form result. This algorithm is just as fast as Gauss's 
Method and so practical on all of the matrices that we will see. 

2.4 Example Doing 2x2 determinants with Gauss's Method 



10 



doesn't give a big time savings because the 2x2 determinant formula is easy. 
However, a 3 x 3 determinant is often easier to calculate with Gauss's Method 



2 4 




2 


4 


-1 3 







5 



Section I. Definition 



303 



than with the formula given earlier. 



2 


2 


6 




2 


2 


6 




2 


2 


6 


4 


4 


3 










-9 







-3 


5 





-3 


5 







-3 


5 










-9 



= -54 



2.5 Example Determinants bigger than 3x3 go quickly with the Gauss's Method 
procedure. 



1 





1 


3 




1 





1 


3 




1 





1 


3 





1 


1 


4 







1 


1 


4 







1 


1 


4 











5 













5 










-1 


-3 





1 





1 










-1 


-3 













5 



-5) =5 



The prior example illustrates an important point. Although we have not yet 
found a 4x4 determinant formula, if one exists then we know what value it gives 
to the matrix — if there is a function with properties (l)-(4) then on the above 
matrix the function must return 5. 



2.6 Lemma For each n, if there is an nxn determinant function then it is unique. 

Proof Perform Gauss's Method on the matrix, keeping track of how the sign 
alternates on row swaps, and then get the value by multiplying down the diagonal 
of the echelon form result. By the definition and the lemma, all nxn determinant 
functions must return this value on the matrix. QED 

The 'if there is an nxn determinant function' emphasizes that although we 
can use Gauss's Method to compute the only value that a determinant function 
could possibly return, we haven't yet shown that such a function exists for all n. 
In the rest of the section we will do that. 

Exercises 

For these, assume that an nxn determinant function exists for all n. 
/ 2.7 Use Gauss's Method to find each determinant. 



(b) 



2.8 Use Gauss's Method to find each. 

2-1 ' ' ° 

(b) 



(a) 



1 



3 2 
5 2 2 

2.9 For which values of k does this system have a unique solution? 

X + z — w = 2 
y-2z =3 
X + kz =4 
z — w = 2 

/ 2.10 Express each of these in terms of |H|. 
H3,i h3? hs. 



(a) 



h2,i 
hi 1 



h2,2 
hi,2 
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— hi^i — h.i_2 —hi ,3 



(b) -2h2,, -2h2,2 -2h2,3 

-3h3j -3h.3,2 -3h3,3 



Hl,1+h3,l H,_2+h3_2 H,_3+h3^3 
(c) H2,, H2,2 H2,3 

5h3,i 5h3,2 5h3,3 



/ 2.11 Find the determinant of a diagonal matrix. 

2.12 Describe the solution set of a homogeneous linear system if the determinant of 
the matrix of coefficients is nonzero. 
/ 2.13 Show that this determinant is zero. 

y + z x + z x + y 



2.14 (a) Find the 1x1, 2x2, and 3x3 matrices with i, j entry given by (—1)^+'. 
(b) Find the determinant of the square matrix with i,j entry (—1)'+'. 

2.15 (a) Find the 1x1, 2x2, and 3x3 matrices with i,j entry given by i + j. 
(b) Find the determinant of the square matrix with i, j entry i + j . 

/ 2.16 Show that determinant functions are not linear by giving a case where |A + B| / 



2.17 The second condition in the definition, that row swaps change the sign of a 
determinant, is somewhat annoying. It means we have to keep track of the number 
of swaps, to compute how the sign alternates. Can we get rid of it? Can we replace 
it with the condition that row swaps leave the determinant unchanged? (If so 
then we would need new 1x1, 2x2, and 3x3 formulas, but that would be a minor 
matter.) 

2.18 Prove that the determinant of any triangular matrix, upper or lower, is the 
product down its diagonal. 

2.19 Refer to the definition of elementary matrices in the Mechanics of Matrix 
Multiplication subsection. 

(a) What is the determinant of each kind of elementary matrix? 

(b) Prove that if E is any elementary matrix then |ES| = |E||S| for any appropriately 
sized S. 

(c) (This question doesn't involve determinants.) Prove that if T is singular 
then a product TS is also singular. 

(d) Show that |TS| = |T||S|. 

(e) Show that if T is nonsingular then |T^' | = |T|^^ . 

2.20 Prove that the determinant of a product is the product of the determinants 
|TS| = 1T| |S| in this way. Fix the n x n matrix S and consider the function 
d: Mnxn -> R given by T 1-^ |TS|/|S|. 

(a) Check that d satisfies property (1) in the definition of a determinant function. 

(b) Check property (2). 

(c) Check property (3). 

(d) Check property (4). 

(e) Conclude the determinant of a product is the product of the determinants. 

2.21 A submatrix of a given matrix A is one that we get by deleting some of the 
rows and columns of A. Thus, the first matrix here is a submatrix of the second. 



X 



V 



z 



|A| + |B|. 
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Prove that for any square matrix, the rank of the matrix is r if and only if r is the 
largest integer such that there is an rxr submatrix with a nonzero determinant. 
/ 2.22 Prove that a matrix with rational entries has a rational determinant. 
? 2.23 [Am. Math. Mon., Feb. 1953] Find the element of likeness in (a) simplifying a 
fraction, (b) powdering the nose, (c) building new steps on the church, (d) keeping 
emeritus professors on campus, (e) putting B, C, D in the determinant 



1 


Q 


q2 


q3 


q3 


1 


Q 




B 


q3 


1 


Q 


C 


D 


q3 


1 



1.3 The Permutation Expansion 

The prior subsection defines a function to be a determinant if it satisfies four 
conditions and shows that there is at most one n x n determinant function for 
each n. What is left is to show that for each n such a function exists. 

But, we easily compute determinants: use Gauss's Method, keeping track of 
the sign changes from row swaps, and end by multiplying down the diagonal. 
So how could such a function not exist? 

The difficulty is to show that the computation gives a well-defined — that 
is, unique — result. Consider these two Gauss's Method reductions of the same 
matrix, the first without any row swap 

(:;) 

and the second with one. 

1 2\ piop2 / 3 

' V ^ V 

Both yield the determinant —2 since in the second one we note that the row 
swap changes the sign of the result we get by multiplying down the diagonal. To 
illustrate how a computation that is like the ones that we are doing could fail to 
be well-defined, suppose that Definition 2.1 did not include condition (2). That 
is, suppose that we instead tried to define determinants so that the value would 
not change on a row swap. Then first reduction above would yield —2 while the 
second would yield +2. We could still do computations but they wouldn't give 
consistent outcomes — there is no function that satisfies conditions (1), (3), (4), 
and also this altered second condition. 

Of course, observing that Definition 2.1 does the right thing with these two 
reductions of the above matrix is not enough. That is, the way that we have 
given to compute determinant values does not plainly eliminate the possibility 
that there might be, say, two reductions of some 7x7 matrix that lead to different 



-3p,+P2 / 1 2 
^ 0-2, 



4\ -(i/3)p,+p2 (3 r 
2 ~^ [o 2/3, 
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determinant value outputs. In that case we would not have a function, since the 
definition of a function is that for each input there must be exactly one output. 
In the rest of this section we will show that there is never a conflict. 

To do this we will define an alternative way to find the value of a determinant. 
(This new way is less useful in practice since it makes the computations awkward 
and slow, which is why we didn't start with it. But it is useful for theory and 
it makes the proof that we need easier.) The key idea is in property (3) of 
Definition 2.1. It shows that the determinant function is not linear. 

3.1 Example For this matrix 

-(-;;) 

det(2A) ^ 2 • det(A). Instead, scalars come out of each of the rows separately. 



4 


2 


= 2- 


2 1 


= 4- 


2 1 


-2 


6 




-2 6 




-1 3 



Since scalars come out a row at a time, we might guess that determinants 
are linear a row at a time. 

3.2 Definition Let V be a vector space. A map f : V"^ — > M is multilinear if 

(1) f(pi,...,V + W, ...,Pn) =f(pi,...,V, ...,pn) +f(pl,...,W, ...,Pn) 

(2) f(pi,...,kv,...,Pn) = k- f(pi,...,A7,...,Pn] 

for V, w e V and k e M. 

3.3 Lemma Determinants are multilinear. 

Proof Property (2) here is just condition (3) in Definition 2.1 so we need only 
verify property (1). 

There are two cases. If the set of other rows { pi , . . . , Pi_i , pt+i , . . . , Pn } 
is linearly dependent then all three matrices are singular and so all three 
determinants are zero and the equality is trivial. 

Therefore assume that the set of other rows is linearly independent. This 
set has n — 1 members so we can make a basis by adding one more vector 
(pi , . . . , pi_i , (3, pi,+i , . . . , Pn) • Express v and w with respect to this basis 

V=Vipi H hVi_,pi_i + ViP + Vi+l Pi+i H hVnpn 

W =W,pi + hWi_iPi_i +Wi(3 +Wi+,pi+, H \-WnP-a 

and add. 

V + W = (Vi +Wi]pi H h (Vi +wO|3 H h (Vn +Wn)Pn 

Consider the left side of property (1) and expand v + w. 
det(pi,..., (vi +wi)pi H h (vi+wO(3H + ( (*) 
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By the definition of determinamt's condition (1), the value of (*) is unchamged 
by the operation of adding — (vi + wi )pi to the i-th row v + w. The i-th row 
becomes this. 

V + W-(Vi +Wi)pi = (V2+W2)p2 + --- + (Vi+Wi)PH h(Vn+Wn)Pn 

Next add — (vi +W2)P2, etc., to eUminate all of the terms from the other rows. 
Apply the definition of determinant's condition (3). 

det(pi,...,v + w,...,pn) 

= det(pi,...,(vi+wO • p,...,pn) 

= (Vi+Wi) •det(pi,...,p,...,Pn) 

= Vi • det(pi , . . . , p, . . . , pn) + Wi • det(pi , . . . , p, . . . , Pn) 

Now this is a sum of two determinants. To finish, bring V| and back inside 
in front of the P's and use row combinations again, this time to reconstruct the 
expressions of v and w in terms of the basis. That is, start with the operations 
of adding vi pi to ViP and wi pi to WiPi , etc., to get the expansions of v and w. 
QED 

Multilinearity allows us to expand a determinant into a sum of determinants, 
each of which involves a simple matrix. 

3.4 Example Use property (1) of multilinearity to break up the first row 



2 


1 




2 







1 












+ 




4 


3 




4 


3 


4 3 



and then break each of those two along the second row. 



2 







2 










1 




1 






+ 






+ 






+ 




4 








3 


4 





3 



We are left with four determinants such that in each row of each of the four 
there is a single entry from the original matrix. 

3.5 Example In the same way, a 3 x 3 determinant separates into a sum of many 
simpler determinants. Splitting along the first row produces three determinants 
(we have highlighted the zero in the 1 , 3 position to set it off visually from the 
zeroes that appear as part of the splitting). 



2 


1 


-1 




2 













1 













-1 


4 


3 







4 


3 





+ 


4 


3 





+ 


4 


3 





2 


1 


5 




2 


1 


5 




2 


1 


5 




2 


1 


5 



Each of these splits in three along the second row. Each of the nine splits in 
three along the third row, resulting in twenty seven determinants such that each 
row contains a single entry from the starting matrix. 



2 










2 










2 










2 
















-1 


4 








+ 


4 








+ 


4 








+ 





3 





+ ••• + 











2 













1 













5 




2 
















5 
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So with multilinearity, an n x n determinant expands into a sum of 
determinants where each row of each summand contains a single entry from the 
starting matrix. However, many of these summand determinants are zero. 

3.6 Example In each of these examples from the prior expansion, two of the 
entries from the original matrix are in the same column. 



2 
















-1 







1 





4 













3 


















1 













5 










5 



For instance in the first matrix, the 2 and the 4 both come from the first column 
of the original matrix. Any such matrix is singular because one row is a multiple 
of the other. Thus, any such determinant is zero, by Lemma 2.3. 

With that observation the above expansion of the 3x3 determinant into the 
sum of the twenty seven determinants simplifies to the sum of these six, the 
ones where the entries from the original matrix come not just one per row but 
also one per column. 



2 


1 


-1 




2 










2 








4 


3 










3 





+ 











2 


1 


5 










5 







1 











1 










1 







+ 


4 








+ 




















5 




2 




















1 









-1 


+ 


4 










+ 


3 












1 









2 








We can bring out the scalars. 



= (2](3](5) 



1 
1 
1 



1 
1 
1 








1 









1 









1 







K1)(0)(2) 








1 









1 






1 

















1 












1 




) 


1 





+ (-1)(3)(2) 





1 









1 









1 









To finish, we evaluate those six determinants by row-swapping them to the 
identity matrix, keeping track of the sign changes. 

= 30-(+l) + 0-{-l) 
+ 20- (-1) + 0- (+1) 
-4-f+l)-6-(-n = 12 
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That example captures the new calculation scheme. Multilinearity gives us 
many separate determinants, each with one entry per row from the original 
matrix. Most of these have one row that is a multiple of another so we can omit 
them. We are left with those determinants that have one entry per row and 
column from the original matrix. By factoring out the scalars we can further 
reduce the determinants that we must compute to those one-entry-per-row-and- 
column matrices where all the entries are 1 's. 

Recall Definition Three. IV. 3. 15, that a permutation matrix is square and 
all entries are O's except for a 1 in each row and column. We now introduce a 
notation for permutation matrices. 

3.7 Definition An n-permutation is a function on the first n positive integers 
4):{1,...)Ti}^{1,...,n} that is one-to-one and onto. 



In other words, in a permutation each number 1 , . . . , n appears as output 
for one and only one input. Alternatively, we may denote a permutation as the 
sequence cf) — (4)(1 ), 4)(2), . . . , 4)(ti)). 

3.8 Example The 2-permutations are the functions ct^i : {1)2} {1,2} given by 
4)1 (1) = 1, 4)1 (2) =2, and 4)2: {1,2} ^ {1,2} given by 4)2(1) =2, 4)2(2) = 1. 
The sequence notation is shorter: 4'i = (1,2) and 4)2 = (2, !)• 

In the sequence notation the 3-permutations are 4)i — (1,2,3), 4>2 — (1,3,2), 
4)3 = (2,1,3), 4)4 = (2,3,1), 4>5 = (3,1,2), and <t>6 = (3,2,1). 

Let Lj be the row vector that is all O's except for a 1 in entry j, so that the 
four-wide I2 is (0 1 0). Then our notation will associate permutations 
with permutation matrices in this way: with any 4) — (4'(1 )> ■ • • > 4)('rL)) associate 
the matrix whose rows are L(jj(i), . . . , 14,(^.1 • For instance, associated with the 4- 
(3, 2, 1 , 4) we have the matrix whose rows are the corresponding 



permutation 4" 
l's. 



/^3\ 

1.2 

M 
\14/ 





1 







3.9 Example These are the permutation matrices for the 2-permutations listed 
in Example 3.8. 






For instance, Pefij's first row is i(|,2(^] 
Consider the 3-permutation 4)5 = 
L3, ^4>s{i) = Ll, and l. 



— i2 and its second is ^,^2W ~ • 
(3, 1,2). The permutation matrix 



has 



4>5(3) 



/o l' 
1 
yO 1 0, 
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3.10 Definition The permutation expansion for determinants is 



tn,l tn,2 



t2,Tl 



tl,*! (I)t2,(t,, (2) •••tn,c|),(Ti)|Pct>il 
+ tl,4>2(l)t2, (132(2) ■ • -tn-.^ilnjlP^il 



+ t^,4>^,^^]^2,4>^,^2] ' ' ' tn,*^ (n) |P4>k 1 

where (jji , . . . , 4)ic are all of the n-permutations. 

This formula is often written in summation notation 

|T| = ti ^4,(1 )t2, 4,(2) •• •tn,4,(n) IP*! 

permutations (\) 

read aloud as, "the sum, over all permutations 4^, of terms having the form 

tl,ct)(l)t2,(t)(2) • • • tn,ct)(n)|P4>l-" 

3.11 Example The familiar 2x2 determinant formula follows from the above. 



tl,l t,,2 
t2,l t2,2 



tl,lt2,2 • |P4.,l+tl,2t2,l • |P<t>2l 



tl,lt2,2 



1 
1 



+ tl ,2^2,1 



1 

1 



= tl,lt2,2 — tl,2t2,l 

So does the 3x3 determinant formula. 



tl,l ti^2 tl,3 
t2,l t2,2 t2,3 
t3,l t3 2 ±33 



— tl,lt2,2t3,3 |P(l3, I + tl,lt2,3t3,2 |P4>2 I + tl ,2t2,l t3,3 |P(t)3 
+ tl ,2t2,3t3,l IP4.4 I + tl ,3t2,l t3,2 IPcDs I + tl ,3t2,2t3,l |P*6 I 

— tl,lt2,2t3,3 — tijt2,3t3,2 " ti ,2t2, 1 13,3 

+ tl,2t2,3t3,l + tl, 3^2,1 13^2 " ti ,312,2*3,1 

Computing a determinant by permutation expansion usually takes longer 
than Gauss's Method. However, while it is not often used in practice, we use it 
for the theory, to prove that the determinant function is well-defined. 

We will just state the result here and defer its proof to the following subsec- 
tion. 



3.12 Theorem For each n there is an nxn determinant function. 



Also in the next subsection is the proof of this result (these two proofs share 
some features). 
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3.13 Theorem The determinant of a matrix equals the determinant of its trans- 
pose. 

Because of this theorem, while we have so far stated determinant results in 
terms of rows (e.g., determinants are multilinear in their rows, row swaps change 
the sign, etc.), all of the results also hold in terms of columns. 

3.14 Corollary A matrix with two equal columns is singular. Column swaps 
change the sign of a determinant. Determinants are multilinear in their columns. 

Proof For the first statement, transposing the matrix results in a matrix with 
the same determinant, and with two equal rows, and hence a determinant of 
zero. Prove the other two in the same way. QED 

We finish this subsection with a summary: determinant functions exist, are 
unique, and we know how to compute them. As for what determinants are 
about, perhaps these lines [Kemp] help make it memorable. 

Determinant none, 

Solution: lots or none. 
Determinant some, 

Solution: just one. 

Exercises 

This summarizes the notation that we use for the 2- and 3- permutations. 



i 


1 


2 


i 


1 


2 


3 




1 


2 




1 


2 


3 




2 


1 


4>2(i) 


1 


3 


2 










2 


1 


3 








4)4(1) 


2 


3 


1 








4)5(1) 


3 


1 


2 








4)6(1) 


3 


2 


1 



/ 3.15 Compute the determinant by using the permutation expansion. 





1 


2 3 




2 


2 


1 


(a) 


4 


5 6 


(b) 


3 


-1 







7 


8 9 




-2 





5 



/ 3.16 Compute these both with Gauss's Method and the permutation expansion 
formula. 

1 4 

2 3 

1 5 1 

/ 3.17 Use the permutation expansion formula to derive the formula for 3x3 determi- 
nants. 

3.18 List all of the 4-permutations. 

3.19 A permutation, regarded as a function from the set {1, ..,n} to itself, is one-to- 
one and onto. Therefore, each permutation has an inverse. 

(a) Find the inverse of each 2-permutation. 

(b) Find the inverse of each 3-permutation. 



(b) 
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3.20 Prove that f is multilinear if and only if for all v, w e V and ki,k2 £ R, this 
holds. 

f{p,,. . .,klVl + IC2V2, .. ., Pn) = k,f(pi, .. .,Vi, . .. , Pn) + k2f(pl, . .. , V2, - • - 5 Pn J 

3.21 How would determinants change if we changed property (4) of the definition to 
read that 1I| = 2? 

3.22 Verify the second and third statements in Corollary 3.14. 

/ 3.23 Show that if an nxn matrix has a nonzero determinant then we can express 
any column vector v e as a linear combination of the columns of the matrix. 

3.24 [Strang 80] True or false: a matrix whose entries are only zeros or ones has a 
determinant equal to zero, one, or negative one. 

3.25 (a) Show that there are 120 terms in the permutation expansion formula of a 
5x5 matrix. 

(b) How many are sure to be zero if the 1 , 2 entry is zero? 

3.26 How many n-permutations are there? 

3.27 Show that the inverse of a permutation matrix is its transpose. 

3.28 A matrix A is skew-symmetric if — —A, as in this matrix. 

3\ 

-3 0) 

Show that nxn skew-symmetric matrices with nonzero determinants exist only for 
even n. 

/ 3.29 What is the smallest number of zeros, and the placement of those zeros, needed 

to ensure that a 4 x 4 matrix has a determinant of zero? 
/ 3.30 If we have n data points (xi ), (x2,y2)> ■ ■ • iC'i^njyn) and want to find a 
polynomial p(x) = q^-ix'^^' + 0^-2"'^^^ + • • • + aix + ao passing through those 
points then we can plug in the points to get an n equation/n unknown linear 
system. The matrix of coefficients for that system is the Vandermonde matrix. 
Prove that the determinant of the transpose of that matrix of coefficients 

1 1 ... 1 



A = 



X2 
X2^ 



X2' 



equals the product, over all indices i, j e {1, . . . ,n} with i < j, of terms of the form 
Xj — Xi. (This shows that the determinant is zero, and the linear system has no 
solution, if and only if the Xi's in the data are not distinct.) 
3.31 We can divide a matrix into blocks, as here. 



1 


2 







4 













which shows four blocks, the square 2x2 and 1 x 1 ones in the upper left and lower 
right, and the zero blocks in the upper right and lower left. Show that if a matrix 
is such that we can partition it as 

where J and K are square, and Z] and Z2 are all zeroes, then |T| = |J| • |K|. 
/ 3.32 Prove that for any nxn matrix T there are at most n distinct reals r such that 
the matrix T — rl has determinant zero (we shall use this result in Chapter Five). 



T = 
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? 3.33 [Math. Mag., Jan. 1963, Q307] The nine positive digits can be arranged into 
3x3 arrays in 9! ways. Find the sum of the determinants of these arrays. 

3.34 [Math. Mag., Jan. 1963, Q237] Show that 

x-2 x-3 x-4 

x + 1 x-1 x-3=0. 

x-4 x-7 x-10 

? 3.35 [Am. Math. Mon., Jan. 1949] Let S be the sum of the integer elements of a 
magic square of order three and let D be the value of the square considered as a 
determinant. Show that D/S is an integer. 

? 3.36 [Am. Math. Mon., Jun. 1931] Show that the determinant of the elements in 
the upper left corner of the Pascal triangle 

1111.. 
12 3.. 
13.. 
1 . . 



has the value unity. 



1.4 Determinants Exist 

This subsection is optional. It proves two results from the prior subsection. 
These proofs involve the properties of permutations, which will use again 
only in the optional Jordan Canonical Form subsection. 

The prior subsection develops the permutation expansion formula for deter- 
minants. 

ti^2 ••• tl,n 
t2,l ^2,2 ■■■ t2,Ti 

tn,l ^Ti,2 • • • t^^ix 

+ tl,4>k(l)t2,*k(2) • • -tn.dJklnllP^kl 

= tl ,4)(1 )t2,(l3(2) • • • tn,4)(n) IPtfl 

permutations 4) 

This reduces the problem of showing that for any size n the determinant function 
on all nxn matrices is well-defined to only showing that the determinant is 
well-defined on the set of permutation matrices of that size. 

A permutation matrix can be row-swapped to the identity matrix and so we 
can calculate its determinant by keeping track of the number of swaps. However, 
we still must show that the result is well-defined. Recall what the difficulty 



- tl,(t>r (1)^2, 4)1 (2) • • ■tTi,4), (n)|P<t)i 

+ 4)2(1)^2, 4)2(2) ■ ■ • tn,4,2(n-)lP4)2 I 
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is: the determinant of 



could be computed with one swap 



/O 1 o\ 

10 

10 

\0 1/ 



/l 0\ 



or with three. 



P3<-i-Pl 











Pi <->P2 





1 






















1 























V 


(0 





1 






(' 





1 


'\ 


1 











P2<->P3 





1 











1 










1 



















^) 












V 



o\ 
1 













1 

V 



Both reductions have an odd number of swaps so we figure that \ — —1 but 
if there were some way to do it with an even number of swaps then we would 
have two different answers to one question. Below, Corollary 4.4 proves that 
this cannot happen — there is no permutation matrix that can be row-swapped 
to an identity matrix in two ways, one with an even number of swaps and the 
other with an odd number of swaps. 

So the critical step will be a way to calculate whether the number of swaps 
that it takes could be even or odd. 

4.1 Definition In a permutation cf) = (..., k, j, .. .) elements such that k > j 
are in an inversion of their natural order. Similarly, in a permutation matrix 
two rows 

/:\ 



vJ 



such that k > j are in an inversion. 
4.2 Example This permutation matrix 





(o 












1 


i) 




V 








has three inversions: L3 precedes li , L3 precedes L2, and L2 precedes Li . 
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4.3 Lemma A row-swap in a permutation matrix changes the number of inversions 
from even to odd, or from odd to even. 



Proof Consider a swap of rows j and k, where k > j. If the two rows are 
adjacent 

/ : \ / : \ 



L4>(k) 



L4>(k) 



\ ■ J \ ■■ J 

then since inversions involving rows not in this pair are not affected, the swap 
changes the total number of inversions by one, either removing or producing one 
inversion depending on whether 4)(j) > 4'(k) or not. Consequently, the total 
number of inversions changes from odd to even or from even to odd. 

If the rows are not adjacent then we can swap them via a sequence of adjacent 
swaps, first bringing row k up 



Pk^Pk-I Pk-I^Pk-2 



Pj+1 ^Pj 



V ■ J 

and then bringing row j down. 



L4>(k-1) 



V 



Pj+l<->-Pi+2 Pj+2<->Pj+3 



Pk-r -H-Pk 



l-c|j(k) 

!-*(]) 



v 



Each of these adjacent swaps changes the number of inversions from odd to even 
or from even to odd. There are an odd number (k — j) + (k — j — 1 ) of them. 
The total change in the number of inversions is from even to odd or from odd to 
even. QBD 

4.4 Corollary If a permutation matrix has an odd number of inversions then 
swapping it to the identity takes an odd number of swaps. If it has an even 
number of inversions then swapping to the identity takes an even number of 
swaps. 



316 



Chapter Four. Determinants 



Proof The identity matrix has zero inversions. To change an odd number to 
zero requires an odd number of swaps, and to change an even number to zero 
requires an even number of swaps. QED 



4.5 Definition The signum of a permutation sgn(4)) is —1 if the number of 
inversions in 4) is odd and is +1 if the number of inversions is even. 

4.6 Example In the notation for the 3-permutations from Example 3.8 we have 




and Prf,, = 



/l 0^ 

1 
\0 1 0, 



so sgn(4)i) — 1 because there are no inversions, while sgn(4)2) — —1 because 
there is one. 

We still have not shown that the determinant function is well-defined because 
we have not considered row operations on permutation matrices other than row 
swaps. We will finesse this issue. We will define a function d: Mnxn M by 
altering the permutation expansion formula, replacing |P4,| with sgn{4)). 

d(T) = ^ ti,(|5(,)t2,(|5(2) • ■ ■tn,(|5(n) Sgn(4)) 

permutations 4) 

This gives the same value as the permutation expansion because the corollary 
shows that det(P(|5) — sgn(4)). The advantage of this formula is that the number 
of inversions is clearly well-defined — just count them. Therefore, we will finish 
showing that an nxn determinant function exists by showing that this d satisfies 
the conditions in the determinant's definition. 

4.7 Lemma The function d above is a determinant. Hence determinants exist 
for every n. 

Proof We must check that it has the four properties from the definition. 
Property (4) is easy; where I is the nxn identity, in 

= ^ '•1,(l5(l)L2,cl5(2) • • • Ln.djfn) Sgn(4)) 
perm (j) 

all of the terms in the summation are zero except for the product down the 
diagonal, which is one. 

For property (3) consider d(T) where T — >T. 

y h,<i>(^) ■ ■ '^1,4,(1) • • •tn,4>(Ti) sgn(4)) 

perm cf) 
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Factor out the k to get the desired equality. 

= k • ^ ti^d^d) • --ti^^^i) ■ ■■tn^^i^n] sgn(4)) = k • d(T) 

* 

For (2) suppose that T ''ll^^' J. We must show this is the negative of d(T). 
d(t) = X ti,(t5(i) ■ ■ •ti,4>(i) • • -tj.^H) ■ • •tTi,*(Ti) sgii(4)) (*) 

perm cf) 

We will show that each term in (*) is associated with a term in d(T), and that the 
two terms are negatives of each other. Consider the matrix from the multilinear 
expansion of d(T) giving the term ti_4,(i) • ■ ■ t^^^f^, • • • tj^4,(j) • • •tn,(i,(n) sgn(4)). 

/ : \ 

V / 

It is the result of the pt o Pj operation performed on this matrix. 

V '■ J 

That is, the term with hatted t's is associated with this term from the d(T) 
expansion: t] • • -ij^aij) ■ ■ ■ti,cr(i) • • ■tn,cr(n) sgn(cr), where the permutation 
(J equals (\> but with the i-th and j-th numbers interchanged, cr(i) = 4^(j) and 
cr(j) — 4)(i). The two terms have the same multiplicands ti,4,(i) — ti,o-(i), 
. . . , including the entries from the swapped rows ti^^[i] — tj^ii,{i) = tj,cr(i) ^nd 
^i,4>(i) ~ ^i-,(t>(i) ~ *i,a-(i)- ■B'^^ t-h-^ terms are negatives of each other since 
sgn(4)) = — sgn(cr) by Lemma 4.3. 

Now, any permutation (}) can be derived from some other permutation ff by 
such a swap, in one and only one way. Therefore the summation in (*) is in fact 
a sum over all permutations, taken once and only once. 

<i(t) = ^ tl,4)(l) • • • ti,ct)(i) ■ ■ • tj,4,(j) • ■ ■ tn,4,(n) sgn(4)) 

perm 4) 

= • • •tj,cr(j) • • 'ti.ffli) • • •tn,(j(n) ' (^Sgn(cr)) 

perm o" 



Thus d(T) = -d(T). 
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For property (1) suppose that T ^-1^^ j and consider the effect of a row 
combination. 

d(t) = ^ tl,tt)(l ) • • • ti,ct)(i) • • • tj,4>(j) • • • tn,(t,(n) Sgn(ct)) 
perm 4) 

= tl ,4.(1 ) • • • ti,(l,(i) • • • (kt^^(|5(j) + tj^(|5(j)) • • • tn^t^ln] Sgn(4)) 

Do the algebra. 

= ^[tl,(t)(l) • • -ti,*!!) ■ ■ ■'^U,it>l)) ■ ■ ■tn,4.(n) Sgll(4)) 

* + ti ,4>(i ) ■ ■ • ti,4,(i) ••■ tj, ((,(])■• • tn,ct)(n) sgn(4))] 
= tl,4,(l ) • • • h,<i){i) ■ ■ ■ l^ti,4,(j) • • • tn^4,[n) sgn(4>) 

+ 2_ tl, *(!)■■■ ■" " ' tn,tt)(n) Sgll(4)) 

* 

— ^ ■ tl,4)(l ) ■ ■ ■ ti,4>(i) • • ■ ti,4>(j) • ■ ■ tn,4,(n) Sgll(4)) 

Finish by observing that the terms ti^4,(i) • ■■ti^4,[i] ■ • •ti,4,(j] • • • t^^^jf^) sgn(4)) 
add to zero: this sum represents d(S) where S is a matrix equal to T except 
that row j of S is a copy of row i of T (because the factor is ti^^^^), not tj^(f,(j)) 
and so S has two equal rows, rows i and j. Since we have already shown that d 
changes sign on row swaps, as in Lemma 2.3 we conclude that d(S) = 0. QED 

We have now shown that determinant functions exist for each size. We 
already know that for each size there is at most one determinant. Therefore, the 
permutation expansion computes the one and only determinant. 

We end this subsection by proving the other result remaining from the prior 
subsection. 

4.8 Theorem The determinant of a matrix equals the determinant of its transpose. 

Proof Call the matrix T and denote the entries of with s's so that tij = Sj^i. 
Substitution gives this 

|T| = Y. tl,(t>(l) • ■ -^^n.^ln) Sgn(4)) = ^ S4,(i)j • • • S4,(n),nSgIl(4)) 
perms 4) cf) 

and we will finish the argument by manipulating the expression on the right 
to be recognizable as the determinant of the transpose. We have written all 
permutation expansions with the row indices ascending. To rewrite the expression 
on the right in this way, note that because cf) is a permutation the row indices 
4)(1 ), . . . , 4)(n) are just the numbers 1 , . . . , n, rearranged. Apply commutativity 
to have these ascend, giving ((,-1(1) ■ • • (jj-i (n). 

4,-1 
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Exercise 14 shows that sgn(4) ^) — sgn(ct)). Since every permutation is the 
inverse of another, a sum over all inverses cj)^^ is a sum over all permutations 



Y_ ...Sn,a(n)Sgn(ff) = |T"^ 



QED 



perms o" 

as required. 
Exercises 

These summarize the notation used in this book for the 2- and 3- permutations. 



i 


1 


2 


i 


1 


2 


3 




1 


2 


4)1 W 


1 


2 


3 




2 


1 


4>2(i) 


1 


3 


2 










2 


1 


3 








4)4(1) 


2 


3 


1 








4>5(i) 


3 


1 


2 








4>6(t) 


3 


2 


1 



4.9 Give the permutation expansion of a general 2x2 matrix and its transpose. 
/ 4.10 This problem appears also in the prior subsection. 

(a) Find the inverse of each 2-permutation. 

(b) Find the inverse of each 3-permutation. 

/ 4.11 (a) Find the signum of each 2-permutation. 
(b) Find the signum of each 3-permutation. 

4.12 Find the only nonzero term in the permutation expansion of this matrix. 

10 
10 10 
10 1 
10 

Compute that determinant by finding the signum of the associated permutation. 

4.13 [Strang 80] What is the signum of the n-permutation (() = (n, n — 1 , . . . , 2, 1 )? 

4.14 Prove these. 

(a) Every permutation has an inverse. 

(b) sgn{4)-i) = sgn(ct)) 

(c) Every permutation is the inverse of another. 

4.15 Prove that the matrix of the permutation inverse is the transpose of the matrix 
of the permutation P^-^ = Ptf,^, for any permutation cf). 

/ 4.16 Show that a permutation matrix with m inversions can be row swapped to the 

identity in m steps. Contrast this with Corollary 4.4. 
/ 4.17 For any permutation cf) let g{^) be the integer defined in this way. 

(This is the product, over all indices i and j with i < j, of terms of the given 
form.) 

(a) Compute the value of g on all 2-permutations. 

(b) Compute the value of g on all 3-permutations. 

(c) Prove that g(4>) is not 0. 

(d) Prove this. 

9(4)) 



sgn(4)) 



lg(4))I 



Many authors give this formula as the definition of the signum function. 
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II Geometry of Determinants 

The prior section develops the determinant algebraically, by considering formulas 
satisfying certain properties. This section complements that with a geometric 
approach. One advantage of this approach is that while we have so far only 
considered whether or not a determinant is zero, here we shall give a meaning to 
the value of the determinant. (The prior section treats determinants as functions 
of the rows but in this section we focus on columns.) 



II. 1 Determinants as Size Functions 

This parallelogram picture is familiar from the construction of the sum of the 
two vectors. 




1.1 Definition In M.^ the box (or parallelepiped) formed by {vi , . . . ,Vn) is the 

set {tiVi H htnVn I ti,...,tn e [0..1]}. 



Thus, the parallelogram shown above is the box defined by (( 
We are interested in the area of the box. One way to comj 
this rectangle and subtract the area of each subregion. 




area of parallelogram 

— area of rectangle — area of A — area of B 

— • • • — area of F 

= (xi +X2)(yi +y2) -xiyi -x^y^/2 

- 2/2 - X21J2/2 - xi y 1 /2 - X2y 1 
= Xilj2 -X2yi 



The fact that the area equals the value of the determinant 



Xl X2 



xiy2 -X2yi 



is no coincidence. The properties from the definition of determinants make 
good postulates for a function that measures the size of the box defined by the 
matrix's columns. 

For instance, a function that measures the size of the box should have the 
property that multiplying one of the box-defining vectors by a scalar (here 
k = 1 .4) will multiply the size by that scalar. 
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V kv 



(On the right the rescaled region is in sohd hnes with the original region in 
shade for comparison.) That is, we can reasonably expect of a size measure that 
size (...) kv, . . . ] = k. • size ( . . . , v, . . . ) . Of course, this property is familiar from 
the definition of determinants. 

Another property of determinants that should apply to any function giving 
the size of a box is that it is unaffected by combining rows. Here are before- 
combining and after-combining boxes (the scalar shown is k = 0.35). The box 
formed by v and kv -I- w is more slanted than the original one but the two have 
the same base and the same height and hence the same area. 

w ^^^^^^ '^^ ^ '^^^^^^y^ 

V V 

(As before, the figure on the right has the original region in shade for comparison.) 
So we expect that size(. . . ,v, . . . , w, . . . ) = size(. . . , v, . . . , kv + w, . . . ); again, a 
restatement of a determinant postulate. 
Lastly, we expect that size(ei , ei) = 1 



and we naturally extend that to any number of dimensions size(ei , . . . , Cn) = 1 • 
Because property (2) of determinants is redundant (as remarked following 
the definition) we have that the properties of the determinant function are 
reasonable to expect of a function that gives the size of boxes. The prior section 
starts with these properties and shows that the determinant exists and is unique, 
so we know that these postulates are consistent and that we do not need any 
more postulates. Thus, we will interpret det(vi , . . . , Vn) as the size of the box 
formed by the vectors. 

1.2 Remark Although property (2) of the definition of determinants is redundant 
it raises an important point. Consider these two. 




Swapping changes the sign. On the left we take u first in the matrix amd then 
follow the counterclockwise arc to v, following the counterclockwise arc, and get 
a positive size. On the right following the clockwise arc gives a negative size. 
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The sign returned by the size function reflects the orientation or sense of the 
box. (We see the same thing if we picture the effect of scalar multiplication by a 
negative scalar.) 

Although it is both interesting and important, we don't need the idea of 
orientation for the development below and so we will pass it by. (See Exercise 27.) 

1.3 Definition The volume of a box is the absolute value of the determinant of a 
matrix with those vectors as columns. 



1.4 Example By the formula that takes the area of the base times the height, the 
volume of this parallelepiped is 12. That agrees with the determinant. 




2 





-1 







3 





= 12 


2 


1 


1 





We can also compute the volume as the absolute value of this determinant. 



2 
3 3 

1 2 1 



-12 



The next result describes some of the geometry of the linear functions that 
act on M^. 



1.5 Theorem A transformation t: — > changes the size of all boxes by the 
Scime factor, namely the size of the image of a box |t(S]| is |T| times the size of 
the box |S|, where T is the matrix representing t with respect to the standard 
basis. 

That is, for all nxn matrices, the determinant of a product is the product 
of the determinants |TS| — |T| • |S|. 

The two sentences say the same thing, first in map terms and then in matrix 
terms. This is because |t(S)| — |TS[, as both give the size of the box that is 
the image of the unit box £,n under the composition t o s (where s is the map 
represented by S with respect to the standard basis). 

Proof First consider the |T| = case. A matrix has a zero determinant if and 
only if it is not invertible. Observe that if TS is invertible then there is an M 
such that (TS)M — I, so T(SM) — I, which shows that T is invertible, with 
inverse SM. By contrapositive, if T is not invertible then neither is TS — if 
|T| = then |TS| = 0. 

Now consider the case that |T| 7^ 0, that T is nonsingular. Recall that any 
nonsingular matrix factors into a product of elementary matrices T = E] E2 • • • Er- 
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To finish this argument we will verify that |ES| = |E| • |S| for all matrices S and 
elementary matrices E. The result will then follow because |TS| = |Ei • • • E^S| — 
|Ei|--.|E,|.|S| = |E, •••E,|.|S|==|T|-|S|. 

There are three kinds of elementary matrix. We will cover the Mt(k) case; 
the Pij and Ctj(k) checks are similar. We have that Mi(k)S equals S except 
that row i is multiplied by k. The third property of determinant functions 
then gives that |Mi,(k)S| — k • |S|. But |Mi,(k)| — k, again by the third property 
because Mi(k) is derived from the identity by multiplication of row i by k. Thus 
|ES| = |E| • |S| holds for E = Mi[k). QED 

1.6 Example Application of the map t represented with respect to the standard 
bases by 

a:) 



will double sizes of boxes, e.g., from this 



2 1 
1 2 



to this 




1.7 Corollary If a matrix is invertible then the determinant of its inverse is the 
inverse of its determinant |T^^ | — 1 /|T|. 

Proof 1 = |I| = ITT"^ 1 = |T| • IJ-^ | QED 

Recall that determinants are not additive homomorphisms, that det(A + B) 
need not equal det(A) + det(B). In contrast, the above theorem says that 
determinants are multiplicative homomorphisms: det{AB] equals det(A) •det(B). 



Exercises 

1.8 Find the volume of the region defined by the vectors. 
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/ 1.9 Is 

(i) 

inside of the box formed by these three? 

(i) (•) (• 

/ 1.10 Find the volume of this region. 



/ 1.11 Suppose that |A| = 3. By what factor do these change volumes? 

(a) A (b) (c) A-2 

/ 1.12 By what factor does each transformation change the size of boxes? 

G) - e) G) - - 

1.13 What is the area of the image of the rectangle [2. .4] x [2. .5] under the action of 
this matrix? 

1.14 If t: — !> changes volumes by a factor of 7 and s : — !> changes volumes 
by a factor of 3/2 then by what factor will their composition changes volumes? 

1.15 In what way does the definition of a box differ from the definition of a span? 
/ 1.16 Why doesn't this picture contradict Theorem 1.5? 



(o ].i 



area is 2 determinant is 2 area is 5 

/ 1.17 Does |TS| = |ST|? |T(SP)| = |(TS)P|? 

1.18 (a) Suppose that |A| = 3 and that |B| = 2. Find lA^ ■ • R-^ . a^\. 
(b) Assume that |A| = 0. Prove that leA'' + 5A^ + 2A] = 0. 
/ 1.19 Let T be the matrix representing (with respect to the standard bases) the map 
that rotates plane vectors counterclockwise thru 6 radians. By what factor does T 
change sizes? 

/ 1.20 Must a transformation t: ^ that preserves areas also preserve lengths? 
/ 1.21 What is the volume of a parallelepiped in R^ bounded by a linearly dependent 
set? 

/ 1.22 Find the area of the triangle in R-^ with endpoints (1,2,1), (3,-1,4), and 
(2,2,2). (Area, not volume. The triangle defines a plane — what is the area of the 
triangle in that plane?) 

/ 1.23 An alternate proof of Theorem 1.5 uses the definition of determinant func- 
tions. 

(a) Note that the vectors forming S make a linearly dependent set if and only if 
|S| = 0, and check that the result holds in this case. 

(b) For the |S| 7^ case, to show that |TS[/|S| — ]T[ for all transformations, consider 
the function d: Mnxn R given by T n> |TS|/|S1. Show that d has the first 
property of a determinant. 
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(c) Show that d has the remaining three properties of a determinant function. 

(d) Conclude that |TSi = |T| ■ |S1. 

1.24 Give a non-identity matrix with the property that — A^'. Show that if 
A^ = A^^ then |A| = ±1. Does the converse hold? 

1.25 The algebraic property of determinants that factoring a scalar out of a single 
row will multiply the determinant by that scalar shows that where H is 3x3, the 
determinant of cH is C'' times the determinant of H. Explain this geometrically, 
that is, using Theorem 1.5. (The observation that increasing the linear size of a 
three-dimensional object by a factor of c will increase its volume by a factor of 
while only increasing its surface area by an amount proportional to a factor of 

is the Square-cube law [Wikipedia Square-cube Law].) 
/ 1.26 We say that matrices H and G are similar if there is a nonsingular matrix P 
such that H — P^^ GP (we will study this relation in Chapter Five). Show that 
similar matrices have the same determinant. 
1.27 We usually represent vectors in with respect to the standard basis so vectors 
in the first quadrant have both coordinates positive. 

-3^ 



Rep£ 



-2 



BHi'?),r:i) 



Moving counterclockwise around the origin, we cycle thru four regions: 

... ^ (+^ 

Using this basis 

gives the same counterclockwise cycle. We say these two bases have the same 
orientation. 

(a) Why do they give the same cycle? 

(b) What other configurations of unit vectors on the axes give the same cycle? 

(c) Find the determinants of the matrices formed from those (ordered) bases. 

(d) What other counterclockwise cycles are possible, and what are the associated 
determinants? 

(e) What happens in R' ? 

(f) What happens in R^? 

A fascinating general-audience discussion of orientations is in [Gardner]. 
1.28 This question uses material from the optional Determinant Functions Exist 
subsection. Prove Theorem 1.5 by using the permutation expansion formula for 
the determinant. 

/ 1.29 (a) Show that this gives the equation of a line in R^ thru (x2,y2) and (x3,ij3). 

X X2 X3 

y yi yi =0 
1 1 1 

(b) [Petersen] Prove that the area of a triangle with vertices (xi ,yi ), {x2,y2), and 

(X3,y3) is 

^ Xl X2 X3 

y yi yi ys • 
111 

(c) [Math. Mag., Jan. 1973] Prove that the area of a triangle with vertices at 
(xi,yi), (x2,y2), and (x3,y3) whose coordinates are integers has an area of N or 
N/2 for some positive integer N. 
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III Laplace's Expansion 

Determinants are a font of interesting and amusing formulas. Here is one that is 
often used to compute determinants by hand. 



Ill.l Laplace's Expansion Formula 

1.1 Example In this permutation expansion 



ti,i t^,2 ti^3 

t2,1 t2,2 t2,3 
t3,1 t3,2 t33 



= tl,l"t2,2t3,3 



1 
1 

1 



+ tl,lt2,3t3,2 



1 
1 

1 








1 










1 





+ tl,2t2,lt3,3 


1 








+ tl,2t2,3t3^i 








1 










1 




1 
















1 










1 


+ tl,3t2,lt3,2 


1 








+ tl,3t2,2t3,l 





1 










1 







1 









we can factor out the entries from the first row ti j , ti^2, ti^3 







1 












1 










tl,l • 


t2,2t3,3 





1 





+ t2,3t3,2 








1 














1 









1 
















1 












1 





+ tl,2- 


t2,lt3,3 


1 








+ t2,3t3,l 








1 














1 






1 




















1 












1 


+ tl,3- 


t2,lt3,2 


1 








+ t2,2t3,1 





1 














1 









1 









and in the permutation matrices swap to get the first rows into place. 







1 












1 










tl,l • 


t2,2t3,3 





1 





+ t2,3t3,2 








1 














1 









1 













1 












1 








-tl,2- 


t2,lt3 


3 





1 





+ t2,3t3,l 








1 














1 









1 











1 












1 








+ tl,3- 


t2,lt3,2 





1 





+ t2,2t3,l 








1 














1 









1 
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The point of the swapping (one swap to each of the permutation matrices on 
the second line and two swaps to each on the third line) is that the three lines 
simplify to three terms. 



t2,2 


t2,3 




t2,l 


t2,3 




t2,l 


t2,2 


t3,2 


t3,3 


-tl,2 • 


t3,l 


t3,3 


+ tl,3 • 


t3,l 


t3,2 



The formula given in Theorem 1.5, which generalizes this example, is a recur- 
rence — the determinant is expressed as a combination of determinants. This 
formula isn't circular because, as here, the determinant is expressed in terms of 
determinants of matrices of smaller size. 



1.2 Definition For any nx n matrix T, the (n — 1 ) x (n — 1 ) matrix formed by 
deleting row i and column j of T is the i, j minor of T. The 1, j cofactor T^j of 
T is (—1)^^' times the determinant of the 1, j minor of T. 

1.3 Example The 1,2 cofactor of the matrix from Example 1.1 is the negative of 
the second 2x2 determinant. 



Ti 



1,2 



t2,l t2,3 
t3,l ^3,3 



1.4 Example Where 



these are the 1 , 2 and 2, 2 cofactors. 




T 



1,2 



(1+2 



4 6 
7 9 



l2,2 



l2+2 



1 3 

7 9 



-12 



1.5 Theorem (Laplace Expansion of Determinants) Where T is an nxn matrix, we 
can find the determinant by expanding by cofactors on any row i or column j . 



|T| =t|,i - Ti,! +ti,2 •Ti,2 + • 
= ti J • Ti J + t2J • T2,j + • 

Proof Exercise 27. 

1.6 Example We can compute the determinant 



QED 



|T| = 



1 2 3 

4 5 6 
7 8 9 



by expanding along the first row, as in Example 1.1 

m = 1 • (+1 



5 6 


+ 2.(-1) 


4 6 


+ 3-(+1) 


4 5 








8 9 




7 9 


7 8 



= -3 + 12-9 = 
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Alternatively, we can expand down the second column. 



|T| = 2.(-i; 



4 6 
7 9 



-5-(+i: 



1 3 

7 9 



• • (-r 



1 3 
4 6 



= 12-60 + 48 = 



1.7 Example A row or column with many zeroes suggests a Laplace expansion. 

1 5 

2 1 1 =0.(+1] ^ _] +1 •(-!) I J +0.(+1) 2 1 
3-10 

We finish by applying this result to derive a new formula for the inverse 
of a matrix. With Theorem 1.5, we can calculate the determinant of an nxn 
matrix T by taking linear combinations of entries from a row and their associated 
cofactors. 

ti,l • Ti,^ + %2 • Ti,2 + • • • + ti,n • Ti,n = |T| (*) 

Recall that a matrix with two identical rows has a zero determinant. Thus, for 
any matrix T, weighing the cofactors by entries from row k with k ^ i gives zero 



ti rt • Tic,n — 



(**) 



because it represents the expansion along the row k of a matrix with row 1 equal 
to row k. This summarizes {*) and {**). 



( ti,2 
t2,l t2,2 

\ tn,l tn,2 



t2, 



/T,,1 T2,1 
T2,2 



T 



1,2 



Tn,l 
Tn,2 



tn.n / \Tl,n '^2,n 



/|Ti 
ITI 



Tn,n/ V 







0\ 


iTiy 



Note that the order of the subscripts in the matrix of cofactors is opposite to 
the order of subscripts in the other matrix; e.g., along the first row of the matrix 
of cofactors the subscripts are 1 , 1 then 2, 1 , etc. 



1.8 Definition The matrix adjoint to the square matrix T is 

/Ti,i 72,1 

Tl ,2 T2,2 



adj(T) 



Tn,2 



where Tj^i is the j,i cofactor. 



1.9 Theorem Where T is a square matrix, T • adj(T) = adj(T) • T = |T| ■ I. 



Proof Equations (*) and (**). 



QED 
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1.10 Example If 



then adj(T) is 




V 










4\ 








2 


1 - 


-i) 















1 -1 




4 




4 


1 




1 




1 -1 



2 -1 




1 4 


1 1 




1 1 





1 




1 




1 




2 1 



4 
-1 












1 


-( 










1 












3 


-3 







-3 


1 







^ 






/ 1 





-4 


-3 


-3 












and taking the product with T gives the diagonal matrix |T| • I. 



1.11 Corollary If |T| ^ then T 



(l/m)-adj(T). 



1.12 Example The inverse of the matrix from Example 1.10 is (1/ — 3) • adj(T). 

/ 1/-3 0/-3 -4/-3\ /-1/3 4/3\ 
T"^ = -3/-3 -3/-3 9/-3 = 11 -3 

y-i/-3 0/-3 i/-3y y 1/3 o -i/3y 

The formulas from this section are often used for by-hand calculation and 
are sometimes useful with special types of matrices. However, they are not the 
best choice for computation with arbitrary matrices because they require more 
arithmetic than, for instance, the Gauss- Jordan method. 

Exercises 

/ 1.13 Find the cofactor. 

-Hi J) 

(a) T2,3 (b) T3.2 (c) Ti,3 
/ 1.14 Find the determinant by expanding 

3 1 
1 2 2 

-13 

(a) on the first row (b) on the second row (c) on the third column. 
1.15 Find the adjoint of the matrix in Example 1.6. 
/ 1.16 Find the matrix adjoint to each. 
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... 




(b) 



(c) 



/ 1.17 Find the inverse of each matrix in the prior question with Theorem 1.9. 
1.18 Find the matrix adjoint to this one. 



/2 


1 







1 


2 


1 








1 


2 


1 


Vo 





1 


2j 



/ 1.19 Expand across the first row to derive the formula for the determinant of a 2x2 
matrix. 

/ 1.20 Expand across the first row to derive the formula for the determinant of a 3x3 
matrix. 

/ 1.21 (a) Give a formula for the adjoint of a 2x2 matrix. 

(b) Use it to derive the formula for the inverse. 
/ 1.22 Can we compute a determinant by expanding down the diagonal? 

1.23 Give a formula for the adjoint of a diagonal matrix. 
/ 1.24 Prove that the transpose of the adjoint is the adjoint of the transpose. 

1.25 Prove or disprove: adj(adj(T)) = T. 

1.26 A square matrix is upper triangular if each i, j entry is zero in the part above 
the diagonal, that is, when i > j. 

(a) Must the adjoint of an upper triangular matrix be upper triangular? Lower 
triangular? 

(b) Prove that the inverse of a upper triangular matrix is upper triangular, if an 
inverse exists. 

1.27 This question requires material from the optional Determinants Exist sub- 
section. Prove Theorem 1.5 by using the permutation expansion. 

1.28 Prove that the determinant of a matrix equals the determinant of its transpose 
using Laplace's expansion and induction on the size of the matrix. 

? 1.29 Show that 



-1 
1 
1 





1 

1 



1 



-1 
1 



1 



where Fn is the n-th term of 1 , 1 , 2, 3, 5, . . . , x, y , x + y, . . 
and the determinant is of order n — 1 . [Am. Math. Mon 



the Fibonacci sequence, 
Jun. 1949] 



Cramer's Rule 



We have seen that a linear system 

X] + 2x2 — 6 
3xi + X2 = 8 

is equivalent to a linear relationship among vectors. 



This pictures that vector equation. A parallelogram with sides formed from (j) 
and (^) is nested inside a parallelogram with sides formed from xi (j) ^'^^^)■ 





That is, we can restate the algebraic question of finding the solution of a linear 
system in geometric terms: by what factors xi and X2 must we dilate the vectors 
to expand the small parallelogram so that it will fill the larger one? 

We can apply the geometric significance of determinants to that picture to 
get a new formula. Compare the sizes of these shaded boxes. 
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The second is defined by the vectors Xj (j) and (^) , and one of the properties of 
the size function — the determinant — is that therefore the size of the second 
box is X] times the size of the first box. Since the third box is defined by the 
vector xi (3) +X2(^) — (g) and the vector (^), and since the determinant does 
not change when we add X2 times the second column to the first column, the 
size of the third box equals that of the second. 



6 2 




XI -1 


2 




1 


2 


8 1 




1 


= Xi • 




1 




xi • 3 


3 



Solving gives the value of one of the variables. 



6 2 
8 1 



1 2 

3 1 



-TO 



The generalization of this example is Cramer's Rule: if 1A| 7^ then the 
system Ax — b has the unique solution Xi, = |Bi,|/|A| where the matrix Bi, is 
formed from A by replacing column 1 with the vector b. The proof is Exercise 3. 

For instance, to solve this system for X2 



(^ 







h 


2 


1 


-i) 


X2 









^X3 




we do this computation. 



X2 



1 4 

2 1 -1 
1 1 



-18 



Cramer's Rule allows us to solve simple two equations/two unknowns systems 
by eye (they must be simple in that we can mentally compute with the numbers 
in the system). With practice a person can also do simple three equations/three 
unknowns systems. But computing large determinants takes a long time so 
solving large systems by Creimer's Rule is not practical. 



Exercises 



1 Use Cramer's Rule to solve each for each of the variables. 
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(a) y= 4 -2x+ y=-2 

2 Use Cramer's Rule to solve this system for z. 

2x + y + z = 1 
3x + z = 4 
x-y -z=2 

3 Prove Cramer's Rule. 

4 Here is an alternative proof of Cramer's Rule that doesn't overtly contain any 
geometry. Write Xi for the identity matrix with column i replaced by the vector x 
of unknowns xi , . . . , x^. 

(a) Observe that AXi = Bi. 

(b) Take the determinant of both sides. 

5 Suppose that a linear system has as many equations as unknowns, that all of 
its coefficients and constants are integers, and that its matrix of coefficients has 
determinant 1. Prove that the entries in the solution are all integers. (Remark. 
This is often used to invent linear systems for exercises. If an instructor makes 
the linear system with this property then the solution is not some disagreeable 
fraction.) 

6 Use Cramer's Rule to give a formula for the solution of a two equations/two 
unknowns linear system. 

7 Can Cramer's Rule tell the difference between a system with no solutions and one 
with infinitely many? 

8 The first picture in this Topic (the one that doesn't use determinants) shows a 
unique solution case. Produce a similar picture for the case of infinitely many 
solutions, and the case of no solutions. 



Speed of Calculating Determinants 



The permutation expansion formula for computing determinants is useful for 
proving theorems, but the method of using row operations is much better for 
finding the determinants of a large matrix. We can make this statement precise 
by considering, as computer algorithm designers do, the number of arithmetic 
operations that each method uses. 

We measure the speed of an algorithm by finding how the time taken by 
the computer grows as the size of its input data set grows. For instance, if we 
increase the size of the input data by a factor of ten does the time taken by the 
computer grow by a factor of ten, or by a factor of a hundred, or by a factor of 
a thousand? That is, is the time proportional to the size of the data set, or to 
the square of that size, or to the cube of that size, etc.? 

Recall the permutation expansion formula for determinants. 

tl,l ti_2 ••• tl,n 
t2,l t2,2 ••• t2,n 

tu,! t^^2 • • * t|x,TL 

There are n! = n • (n — 1 ) • (n — 2) • • • 2 • 1 different n-permutations. This factorial 
function grows quickly; for instance when n is only 10 then the expansion 
above has 10! =3, 628, 800 terms, each with n multiplications. Doing n! many 
operations is doing more than many operations (roughly: multiplying the 
first two factors in n! gives n • (n — 1 ) , which for large n is approximately 
rJ- and then multiplying in more factors will make the factorial even larger). 
Similarly, the factorial function grows faster than the cube or the fourth power 
or any polynomial function. So a computer program that uses the permutation 
expansion formula, and thus performs a number of operations that is greater 
than or equal to the factorial of the number of rows, would be very slow. It 
would take a time longer than the square of the number of rows, longer than 
the cube, etc. 

In contrast, the time taken by the row reduction method does not grow 
so fast. The fragment of row-reduction code shown below is in the computer 
language FORTRAN, which is widely used for numeric code. The matrix is in 



^ ti_((,(])t2^4,(2) • ■ ■t^^4,(n) |P(|,| 
permutations 4) 
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the N X N array A. The program's outer loop runs through each ROW between 1 
cind N-1 and does the entry-by-entry combination — PIVINV • Prow + Pi with the 
lower rows. 

DO 10 R0W=1, N-l 

PIVINV=1 . 0/A(ROW , ROW) 
DO 20 I=R0W+1, N 
DO 30 J=I, N 

A(I,J)=A(I,J)-PIVINV*A(ROW,J) 
30 CONTINUE 
20 CONTINUE 
10 CONTINUE 

(This code is naive; for example it does not handle the case that the A (ROW, ROW) 
is zero. Analysis of a finished version that includes all of the tests and subcases 
is messier but gives the same conclusion.) For each ROW, the nested I and J 
loops perform the combination with the lower rows by doing arithmetic on the 
entries in A that are below and to the right of A(ROW,ROW). There are (N — ROW)^ 
such entries. On average, ROW will be N/2. Therefore, this program will perform 
the arithmetic about (n/2)^ times, that is, this program will run in a time 
proportional to the square of the number of equations. Taking into account the 
outer loop, we estimate that the running time of the algorithm is proportional 
to the cube of the number of equations. 

Finding the fastest algorithm to compute the determinant is a topic of current 
research. So far, people have found algorithms that run in time between the 
square and cube of N. 

The contrast between these two methods for computing determinants makes 
the point that although in principle they give the same answer, in practice we 
want the one that is fast. 

Exercises 

Most of these presume access to a computer. 

1 Computer systems generate random numbers (of course, these are only pseudo- 
random, in that they come from an algorithm, but they pass a number of reasonable 
statistical tests for randomness). 

(a) Fill a 5x5 array with random numbers (say, in the range [0..1)). See if it is 
singular. Repeat that experiment a few times. Are singular matrices frequent or 
rare (in this sense)? 

(b) Time your computer algebra system at finding the determinant of ten 5x5 
arrays of random numbers. Find the average time per array. Repeat the prior 
item for 15x15 arrays, 25x25 arrays, 35x35 arrays, etc. You may find that you 
need to get above a certain size to get a timing that you can use. (Notice that, 
when an array is singular, we can sometimes decide that quickly, for instance if 
the first row equals the second. In the light of your answer to the first part, do 
you expect that singular systems play a large role in your average?) 

(c) Graph the input size versus the average time. 

2 Compute the determinant of each of these by hand using the two methods discussed 
above. 

2 10 
13 2 

0-1-2 1 
0-2 1 



(a) 



2 1 

5 -3 



(b) 



3 1 1 

-10 5 
-1 2 -2 



(c) 
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Count the number of multiplications and divisions used in each case, for each of 
the methods. (On a computer, multiplications and divisions take much longer than 
additions and subtractions, so algorithm designers worry about them more.) 

3 What 10x10 array can you invent that takes your computer system the longest 
to reduce? The shortest? 

4 The FORTRAN language specification requires that arrays be stored "by column," 
that is, the entire first column is stored contiguously, then the second column, etc. 
Does the code fragment given take advantage of this, or can it be rewritten to 
make it faster, by taking advantage of the fact that computer fetches are faster 
from contiguous locations? 



Chio's Method 



When doing Gauss's Method on a matrix that contains only integers 

1 




1 ^ 

people often prefer to keep it that way. Instead of the row operations — (3/2)pi + 
P2 and — (1/2)pi + ps they may start by multiplying the rows below the top 
one by 2 

'l 1 1 

68-2] (*) 
>2 10 2 



2P2 
2P3 



cind then the elimination in the first column goes like this. 

(1 1 1 

5-5 



-3p| +P2 
-Pl +P3 







An all-integer approach is easier for mental calculations. And, using integer 
arithmetic on a computer avoids some sticky issues involving floating point 
calculations [Kahan]. So there are sound reasons to take this approach. 

Another reason comes from observing that we can easily apply Laplace's 
expansion to the first column of (**) and then we get the determinant of A by 
remembering to divide by 4 because of (*). 

Here is the general 3x3 case of this approach to finding the determinant. 
First rescale all rows except the top one. 

/ai,i a,,2 ai^sX / 

^2,1 12,2 12,3 

a3,2 a3,3, 



A : 



a i , I P 2 
ai ,1 P3 



This rescales the determinant by Cl- 



1,1 ■ 



CLl,2 ai,3 
a2,icii,i ci2,2ai,i a2,3ai,i 
ya3,iaij a3,2ai,i a3,3ai,i^ 

Now eliminate down the first column. 



-12 ,1 Pi +P2 
-13,1 Pl +P3 





V 



ai,2 

a2,2ai,i - a2,i Qi,2 
a3,2ai,i - a3j qi,2 



Hi ,3 

a2,3ai,i - a2,iai,3 
a3,3ai,i - a3jai,3 
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Let C be the 1 , 1 minor. By Laplace the determinant of the above matrix is 
Qi^i det(C). We thus have ^ det(A) = ai j det(C) and if ai j 7^ then this 
3x3 case gives det(A) = det(C)/ai,i. 

To expand this approach to n x n matrices with n > 3 we must see how to 
compute the minor's entries. The pattern is: each element of the minor is a 
2x2 determinant. For instance, the entry in the minor's upper left a2,2ai,i — 
i2,i cii,2, which is the 2,2 entry in the above matrix, is the determinant of the 
matrix of these four elements of A. 



12,2 
13,2 



a1,3^ 

12,3 
13,3/ 



And the minor's lower left, the 3,2 entry from above, is the determinant of the 
matrix of these four. 

ai,L, ai-3- ai,3 
a2,i 0.2,1 a.2,3 
13,3^ 

So, where A is nxn for n > 3, we let Ohio's matrix C be the (n— 1)x(n — 1) 
matrix whose i, j entry is the determinant 



11,1 ai,j+i 
<ii+i,i ai+i,j+i 



where 1 < i, j ^ n. Chid 's Method for finding the determinant of A is that if 
ai,i ^0 then det(A) = det(C]/a77^. 

By the way, nothing in Ohio's formula requires that the numbers be integers; 
it applies to reals as well. 

We illustrate by finding the determinant of this 3x3 matrix. 

/2 1 1\ 

3 4-1 
5 1 



We derive this Ohio's matrix. 



C = 





2 1 




2 1 


\ 




3 4 




3 -1 






2 1 




2 1 




V 


1 5 




1 1 


/ 



The formula for 3x3 matrices det(A) = det(C]/ai,i gives det(A] = (50/2] = 25. 

For a larger determinant we must do multiple steps but each involves only 
2x2 determinants and so we can often write down only some intermediate 
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information. For instance, given this 4x4 matrix 



A 



1 

2 



1 1\ 

1 

3 

V 



we can find Ohio's matrix by mentally doing each of the 2x2 calculations and 
only noting the 3x3 result. 



/ 



C, = 



3 
1 2 



3 
1 



3 


1 




3 


1 


\ 


1 







1 


1 




3 


1 




3 


1 




2 







2 


3 




3 


1 




3 


1 




1 







1 


1 


/ 



6 


-1 


2 




-2 






-1 





4-2 
1,1 



3^ times the 




We should also note that the determinant of this is d 
determinant of the 4x4 matrix A. 

We can finish by repeating the process with the 3x3 matrix. This is Ohio's 
matrix of it; note that the determinant of this matrix is 6 times the determinant 

of C3. 



C2 



The determinant of C2 is 108. We have det(A) = 108/(3^ • 6) = 2. 

Laplace's expansion formula for evaluating determinants is recursive because 
it reduces the calculation of an n x n determinant to the evaluation of a number 
of (n — 1 ) X (n — 1 ) ones. Ohio's formula is also recursive, so it is similar in spirit, 
but it reduces an nxn determinant to a single (n — 1 ) x (n — 1 ) determinant, the 
calculation of which requires a number of 2x2 determinants. However, for large 
matrices Gauss's Method is better than either; for instance, it takes roughly half 
as many operations as Ohio's Method [Puller & Logan]. 



/ 


6 -1 




6 2 


\ 




-3 -2 




-3 7 






6 -1 




6 2 




V 


-1 




2 


/ 



Exercises 



1 Use Ohio's Method to find each determinant. 



(a) 



(b) 



2 What if Qi 1 is zero? 
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3 The Rule of Sarrus is a mnemonic that students often learn in prior courses 

On the right of the matrix copy the first two 



for the 3x3 determinant formula, 
columns. 

a b c 
d e f 
g H i- 



b 
e 
h 



The determinant is the sum of the three upper-left to lower-right diagonals minus 
the three lower-left to upper-right diagonals aei + bf g + cdh — gee — hf a — idb. 
Count the operations involved in Sarrus's formula and Ohio's formula for the 3x3 
case and see which uses fewer. 
4 Prove Ohio's Formula. 



Computer Code 

This implements Ohio's Method. It is in the computer language Python but to 
make it as readable as possible the code avoids some Python facilities. Note the 
recursive call in the final line of chio_det. 

# ! /usr/bin/python 

# chio.py 

# Calculate a determinant using Chio's method. 

# Jim Hefferon; PD 

# For demonstration only; does not handle the M[0][0]=0 case! 

def det_two(a,b,c,d) : 

Return the determinant of the 2x2 matrix [[a,b], [c,d]] 

return a^^d-b^vc 

def chio_mat(M) : 

Return the Chio matrix as a list of the rows 

M nxn matrix, list of rows 
dim=len(M) 
C=[] 

for row in range (1, dim) : 
C.append([]) 

for col in range (1, dim) : 

C[-l] .append(det_two(M[0] [0] , M[0][col], M[row][0], M[row] [col])) 

return C 



def chio_det(M, show=None) : 

Find the determinant of M by Chio's method 

M mxm matrix, list of rows 
dim=len(M) 
key_elet=M[0] [0] 
if dim==l: 

return key_elet 
return chio_det (chio_mat (M) ) / (key_elet Cdim-2) ) 



if name == ' main ' : 

M=t[2,l,l], [3,4,-1], [1,5,1]] 
print "M=",M 

print "Chio det is", chio_det(M) 

This is the result of calling the program from my command line. 

$ python chio.py 

M=[[2, 1, 1], [3, 4, -1], [1, 5, 1]] 
Chio det is 25 



Projective Geometry 



There are geometries other than the familiar Euclidean one. One such geometry 
arose in art, where people observed that what a viewer sees is not necessarily 
what is there. This is Leonardo da Vinci's The Last Supper. 




Look at where the ceiling meets the left and right walls. In the room those lines 
are parallel but as viewers we see lines that, if extended, would intersect. The 
intersection point is the vanishing point. This aspect of perspective is familiar 
as an image of railroad tracks that appear to converge at the horizon. 

To depict the room da Vinci has adopted a model of how we see, of how we 
project the three dimensional scene to a two dimensional image. This model 
is only a first approximation: it does not take into account the curve of our 
retina, that our lens bends the light, that we have binocular vision, or that our 
brain's processing greatly affects what we see. Nonetheless it is interesting, both 
artistically and mathematically. 

This is a central projection from a single point to the plane of the canvas. 
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It is not a orthogonal projection since the hne from the viewer to C is not 
orthogonal to the image plane. 

The operation of central projection preserves some geometric properties, for 
instance lines project to lines. However, it fails to preserve some others, for 
instance equal length segments can project to segments of unequal length (AB 
is longer than BC, because the segment projected to AB is closer to the viewer 
and closer things look bigger). The study of the effects of central projections is 
projective geometry. 

There are three cases of central projection. The first is the projection done 
by a movie projector. 




projector P source S image I 



We can think that each source point is "pushed" from the domain plane outward 
to the image point in the codomain plane. The second case of projection is that 
of the artist "pulling" the source back to the canvas. 




painter P image I source S 

The two are different because in the first case S is in the middle while in the 
second case I is in the middle. One more configuration is possible, with P in the 
middle. An example of this is when we use a pinhole to shine the image of a 
solar eclipse onto a piece of paper. 




source S pinhole P image I 
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Although the three are not exactly the same, they are similar. We shall say 
that each is a central projection by P of S to I. We next look at three models of 
central projection, of increasing abstractness but also of increasing uniformity. 

The last will bring out the linear algebra. 

Consider again the effect of railroad tracks that appear to converge to a point. 
We model this with parallel lines in a domain plane S and a projection via a P 
to a codomain plane I. (The gray lines are parallel to S and I planes.) 




All three projection cases appear in this one setting. The first picture below 
shows P acting as a movie projector by pushing points from part of S out to 
image points on the lower half of I. The middle picture shows P acting as the 
artist by pulling points from another part of S back to image points in the 
middle of I. In the third picture P acts as the pinhole, projecting points from 
S to the upper part of I. This third picture is the trickiest — the points that 
are projected near to the vanishing point are the ones that are far out on the 
bottom left of S. Points in S that are near to the vertical gray line are sent high 
up on I. 



/ / / 




There are two awkward things about this situation. The first is that neither 
of the two points in the domain nearest to the vertical gray line (see below) 
has an image because a projection from those two is along the gray line that is 
parallel to the codomain plane (we sometimes say that these two are projected 
to infinity). The second awkward thing is that the vanishing point in I isn't the 
image of any point from S because a projection to this point would be along 
the gray line that is parallel to the domain plane (we sometimes say that the 
vanishing point is the image of a projection "from infinity"). 
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For a model that eliminates this awkwardness, put the projector P at the 
origin. Imagine that we cover P with a glass hemispheric dome. As P looks 
outward, anything in the line of vision is projected to the same spot on the 
dome. This includes things on the line between P and the dome, as in the case 
of projection by the movie projector. It includes things on the line further from 
P than the dome, as in the case of projection by the painter. It also includes 
things on the line that lie behind P, as in the case of projection by a pinhole. 



keK} 




Prom this perspective P, all of the spots on the line are the same point. Accord- 
ingly, for any nonzero vector v S M^, we define the associated point v in the 
projective plane to be the set {kv | k e M and k 7^ 0} of nonzero vectors lying 
on the same line through the origin as v. To describe a projective point we can 
give any representative member of the line, so that the projective point shown 
above can be represented in any of these three ways. 





Each of these is a homogeneous coordinate vector for v. 

This picture and definition clarifies the description of central projection but 
there is something awkward about the dome model: what if the viewer looks 
down? If we draw P's line of sight so that the part coming toward us, out of the 
page, goes down below the dome then we cam trace the line of sight backward, 
up past P and toward the part of the hemisphere that is behind the page. So 
in the dome model, looking down gives a projective point that is behind the 
viewer. Therefore, if the viewer in the picture above drops the line of sight 
toward the bottom of the dome then the projective point drops also and as the 
line of sight continues down past the equator, the projective point suddenly 
shifts from the front of the dome to the back of the dome. (This brings out that 
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in fact the dome is not quite an entire hemisphere, or else when the viewer is 
looking exactly along the equator then there are two points in the line on the 
dome. Instead we define it so that the points on the equator with a positive y 
coordinate, as well as the point where y — and x is positive, are on the dome 
but the other equatorial points are not.) This discontinuity means that we often 
have to treat equatorial points as a separate case. That is, while the railroad 
track discussion of central projection has three cases, the dome model has two. 

We can do better, we can reduce to having no separate cases. Consider a 
sphere centered at the origin. Any line through the origin intersects the sphere 
in two spots, which are antipodal. Because we associate each line through the 
origin with a point in the projective plane, we can draw such a point as a pair 
of antipodal spots on the sphere. Below, we show the two antipodal spots 
connected by a dashed line to emphasize that they are not two different points, 
the pair of spots together make one projective point. 




While drawing a point as a pair of antipodal spots is not as natural as the 
one-spot-per-point dome mode, on the other hand the awkwardness of the dome 
model is gone, in that if as a line of view slides from north to south, no sudden 
changes happen. This model of central projection is uniform. 

So far we have described points in projective geometry. What about lines? 
What a viewer at the origin sees as a line is shown below as a great circle, the 
intersection of the model sphere with a plane through the origin. 




(We've included one of the projective points on this line to bring out a subtlety. 
Because two antipodal spots together make up a single projective point, the 
great circle's behind-the-paper part is the same set of projective points as its 
in-front-of-the-paper part.) Just as we did with each projective point, we will 
also describe a projective line with a triple of reals. For instance, the members 
of this plane through the origin in 



project to a line that we can describe with the row vector (1 1 — 1 ) (we use a 
row vector to typographically set lines apart from points). In general, for any 
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nonzero three-wide row vector L we define the associated line in the projective 
plane, to be the set L = {kL | k G K and k ^ 0} of nonzero multiples of L. 

The reason that this description of a line as a triple is convenient is that 
in the projective plane, a point v and a line L are incident — the point lies 
on the line, the line passes through the point — if and only if a dot product 
of their representatives viLi + V2L2 +V3L3 is zero (Exercise 4 shows that this 
is independent of the choice of representatives v and L). For instance, the 
projective point described above by the column vector with components 1 , 2, 
and 3 lies in the projective line described by (1 1 —1), simply because any 
vector in whose components are in ratio 1:2:3 lies in the plane through the 
origin whose equation is of the form 1k-x+1k-y — lk-z = Ofor any nonzero k. 
That is, the incidence formula is inherited from the three-space lines and planes 
of which V and L are projections. 

Thus, we can do analytic projective geometry. For instance, the projective 
line L = (1 1 —1) has the equation Ivi + lv2 — Ivs = 0, because points 
incident on the line have the property that their representatives satisfy this 
equation. One difference from familiar Euclidean analytic geometry is that in 
projective geometry we talk about the equation of a point. For a fixed point like 




the property that characterizes lines through this point (that is, lines incident on 
this point) is that the components of any representatives satisfy 1 Li +2L2 +3L3 = 
and so this is the equation of v. 

This symmetry of the statements about lines and points brings up the Duality 
Principle of projective geometry: in any true statement, interchanging 'point' 
with 'line' results in another true statement. For example, just as two distinct 
points determine one and only one line, in the projective plane two distinct lines 
determine one and only one point. Here is a picture showing two lines that cross 
in antipodal spots and thus cross at one projective point. 



(*) 



Contrast this with Euclidean geometry, where two distinct lines may have a 
unique intersection or may be parallel. In this way, projective geometry is 
simpler, more uniform, than Euclidean geometry. 

That simplicity is relevant because there is a relationship between the two 
spaces: we can view the projective plane as an extension of the Euclidean plane. 
Take the sphere model of the projective plane to be the unit sphere in M.^ and 
take Euclidean space to be the plane z — ]. This gives us a way of viewing some 
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points in projective space as corresponding to points in Euclidean space, because 
all of the points on the plane are projections of antipodal spots from the sphere. 



Note though that projective points on the equator don't project up to the plane. 
Instead, these project 'out to infinity'. We can thus think of projective space 
as consisting of the Euclidean plane with some extra points adjoined — the 
Euclidean plane is embedded in the projective plane. These extra points, the 
equatorial points, are the ideal points or points at infinity and the equator is 
the ideal line or line at infinity (it is not a Euclidean line, it is a projective 
line). 

The advantage of the extension to the projective plane is that some of the 
awkwardness of Euclidean geometry disappears. For instance, the projective 
lines shown above in (*) cross at antipodal spots, a single projective point, on 
the sphere's equator. If we put those lines into (**) then they correspond to 
Euclidean lines that are parallel. That is, in moving from the Euclidean plane to 
the projective plane, we move from having two cases, that lines either intersect 
or are parallel, to having only one case, that lines intersect (possibly at a point 
at infinity). 

The projective case is nicer in many ways than the Euclidean case but has 

the problem that we don't have the same experience or intuitions with it. That's 
one advantage of doing analytic geometry where the equations can lead us to the 
right conclusions. Analytic projective geometry uses linear algebra. For instance, 
for three points of the projective plane t, u, and v, setting up the equations 
for those points by fixing vectors representing each, shows that the three are 
coUinear — incident in a single line — if and only if the resulting three-equation 
system has infinitely many row vector solutions representing that line. That, in 
turn, holds if and only if this determinant is zero. 



Thus, three points in the projective plane are collinear if and only if any three 
representative column vectors are linearly dependent. Similarly (and illustrating 
the Duality Principle), three lines in the projective plane are incident on a 
single point if and only if any three row vectors representing them are linearly 
dependent. 

The following result is more evidence of the niceness of the geometry of the 
projective plane, compared with the Euclidean case. These two triangles in 
perspective from the point O because their corresponding vertices are colhnear. 




(**) 



ti Ul Vl 
t3 Us V3 
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Consider the pairs of corresponding sides: the sides TiUi and T2U2, the sides 
TiVi and T2V2, and the sides UiVi and U2V2. Desargue's Theorem is that 
when we extend the three pairs of corresponding sides to full lines, they intersect 
(shown here as the point TU, the point TV, and the point UV), and further, 
those three intersection points are coUinear. 

uv 



TV 



TU 




We will prove this theorem, using projective geometry. (We've drawn these 
as Euclidean figures because it is the more familiar image. To consider them 
as projective figures, we can imagine that, although the line segments shown 
are parts of great circles and so are curved, the model has such a large radius 
compared to the size of the figures that the sides appear in this sketch to be 
straight.) 

For this proof we need a preliminary lemma [Coxeter]: if W, X, Y, Z are four 
points in the projective plane (no three of which are collinear) then there are 
homogeneous coordinate vectors w, x, y, and z for the projective points, and a 
basis B for M^, satisfying this. 



RepB (w) = 




RePB iy) 



Repj 



For the proof, because W, X, and Y are not on the same projective line, any 
homogeneous coordinate vectors wq , xq , and yo do not line on the same plane 
through the origin in and so form a spanning set for M^. Thus any homo- 
geneous coordinate vector for Z is a combination zo = a • wq + b • xq + c • yo • 
Then, we can take w = a • wq, x = b • xq, y = c • yo, and z = zo, where the basis 
is B = {w,x,y). 

Now, to prove Desargue's Theorem use the lemma to fix homogeneous 
coordinate vectors and a basis. 



1 



Repg (ti . 



Repg (ui ; 



RepB (vi ; 



RepB(o) 
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Because the projective point T2 is incident on the projective hne OTi , any 
homogeneous coordinate vector for T2 lies in the plane through the origin in 
that is spanned by homogeneous coordinate vectors of O and Ti : 



RepB(t2) = a 



for some scalars a and b. That is, the homogeneous coordinate vectors of 
members I2 of the line OTi are of the form on the left below, and the forms for 
U2 and V2 are similar. 









(i) 


+ b 


F) 









RepB(t2] 



RepB(u2) 



RepB(v2] 



/l 
1 



The projective line T1U1 is the image of a plane through the origin in M^. One 
way to get its equation is to note that any vector in it is linearly dependent on 
the vectors for Ti and Ui and so this determinant is zero. 



1 





X 











1 


y 


= 




z = 








z 









The equation of the plane in whose image is the projective hne T2U2 is this. 

=^ (1 -U2)-X+(1 -t2)-y + (t2U2-l)-Z = 



t2 1 X 

1 U2 y 
1 1 z 







Finding the intersection of the two is routine. 




(This is, of course, the homogeneous coordinate vector of a projective point.) 
The other two intersections are similar. 



T1V1 n T2V2 = 



/l-t2\ 





U1V1 n U2V2 = 




Finish the proof by noting that these projective points are on one projective 
line because the sum of the three homogeneous coordinate vectors is zero. 

Every projective theorem has a translation to a Euclidean version, although 
the Euclidean result may be messier to state and prove. Desargue's theorem 
illustrates this. In the translation to Euclidean space, we must treat separately 
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the case where O hes on the ideal hne, for then the hnes T1T2, U1U2, and V1V2 
are parallel. 

The parenthetical remark following the statement of Desargue's Theorem 
suggests thinking of the Euclidean pictures as figures from projective geometry 
for a model of very large radius. That is, just as a small area of the world seems 
to people living there to be flat, the projective plane is locally Euclidean. 

Although its local properties are familiar, the projective plane has a global 
property that is quite different from Euclidean space. The picture below shows 
a projective point. At that point we have drawn Cartesian axes, xy-axes. Of 
course, the axes appear in the picture at two antipodal spots, one in the northern 
hemisphere (that is, shown on the right, in black) and the other in the south. 
In the northern hemisphere a person who puts their right hand on the sphere, 
palm down, with their fingers pointing along the x-axis in the positive direction 
will have their thumb point in the positive direction on the y-axis. But the 
antipodal axes give the opposite: if a person puts their right hand on the southern 
hemisphere spot on the sphere, palm on the sphere's surface, with their fingers 
pointing toward positive infinity on the x-axis, then their thumb points on the 
y-axis toward negative infinity. Instead, to have their fingers point positively on 
the X-axis and their thumb point positively on the y, a person must use their 
left hand. Briefiy, the projective plane is not orientable — in this geometry, left 
and right handedness are not fixed properties of figures. 




The sequence of pictures below dramatizes this non-orientability. They sketch a 
trip around this space in the direction of the y part of the xy-axis. (Warning: 
the trip shown is not halfway around, it is a full circuit. True, if we made this 
into a movie then we could watch the northern hemisphere spots in the drawing 
above gradually rotate about halfway around the sphere to the last picture below. 
And we could watch the southern hemisphere spots in the picture above slide 
through the south pole and up through the equator to the last picture. But: the 
spots at either end of the dashed line are the same projective point. We don't 
need to continue on much further; we are pretty much back to the projective 
point where we started by the last picture.) 




At the end of the circuit, the x part of the xy-axes sticks out in the other 
direction. Thus, in the projective plane we cannot describe a figure as right- or 
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left-handed (another way to make this point is that we cannot describe a spiral 
as clockwise or counterclockwise). 

This exhibition of the existence of a non-orientable space raises a question: is 
our universe orientable? For instance, could an astronaut leave earth right- 
handed and return left-handed? [Gardner] is a nontechnical reference. [Clarke] 
is a classic science fiction story about orientation reversal. 

So projective geometry is mathematically interesting, in addition to the 
natural way in which it arises in art. It is more than just a technical device to 
shorten some proofs. For an overview, see [Courant & Robbins]. The approach 
we've taken here, the analytic approach, leads to quick theorems and — most 
importantly for us — illustrates the power of linear algebra (see [Hanes], [Ryan], 
and [Eggar]). But another approach, the synthetic approach of deriving the 
results from an axiom system, is both extraordinarily beautiful and is also the 
historical route of development. Two fine sources for this approach are [Coxeter] 
or [Seidenberg] . An interesting and easy application is [Davies] 

Exercises 

1 What is the equation of this point? 



3 Find the formula for the line incident on two projective points. Find the formula 
for the point incident on two projective lines. 

4 Prove that the definition of incidence is independent of the choice of the rep- 
resentatives of p and L. That is, if pi , p2, p3, and qi , qz, qs are two triples of 
homogeneous coordinates for p, and Li , L2, L3, and M] , M2, M3 are two triples of 
homogeneous coordinates for L, prove that piLi + P2L2 +P3L3 = if and only if 
qiMi + q2M2 + q3M3 = 0. 

5 Give a drawing to show that central projection does not preserve circles, that a 
circle may project to an ellipse. Can a (non-circular) ellipse project to a circle? 

6 Give the formula for the correspondence between the non-equatorial part of the 
antipodal modal of the projective plane, and the plane z = 1 . 

7 (Pappus's Theorem) Assume that To, Uo, and Vq are collinear and that Ti , Ui , 
and Vi are collinear. Consider these three points: (i) the intersection V2 of the lines 
Toll] and TUo, (ii) the intersection Uz of the lines TqVi and T] Vq, and (iii) the 
intersection T2 of UoVi and Ui Vq. 

(a) Draw a (Euclidean) picture. 

(b) Apply the lemma used in Desargue's Theorem to get simple homogeneous 
coordinate vectors for the T's and Vq. 

(c) Find the resulting homogeneous coordinate vectors for U's (these must each 
involve a parameter as, e.g., Uq could be anywhere on the TqVo line). 




2 (a) Find the line incident on these points in the projective plane. 
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(d) Find the resulting homogeneous coordinate vectors for Vi . {Hint: it involves 
two parameters.) 

(e) Find the resulting homogeneous coordinate vectors for V2. (It also involves 
two parameters.) 

(f ) Show that the product of the three parameters is 1 . 

(g) Verify that V2 is on the T2U2 line. 



Similarity 



We have shown that for any homomorphism there are bases B and D such that 
the representation matrix has a block partial-identity form. 



RepB,D(^) = 



Identity 


Zero 


Zero 


Zero 



This representation describes the map as sending Ci Pi + ■ ■ • + CnPn to Ci 5i + 
• • • + Cic^k + + ■ ■ • + 0, where n is the dimension of the domain and k is the 
dimension of the range. So, under this representation the action of the map is 
easy to understand because most of the matrix entries are zero. 

This chapter considers the special case where the domain and codomain are 
the same. We naturally ask for the basis for the domain and codomain be the 
same, that is, we want a B so that Repg 3(1) is as simple as possible (we will 
take 'simple' to mean that it has many zeroes). We will find that we cannot 
always get a matrix having the above block partial-identity form but we will 
develop a form that comes close, a representation that is nearly diagonal. 



I Complex Vector Spaces 

This chapter requires that we factor polynomials, but many polynomials do 
not factor over the real numbers. For instance, y} + 1 does not factor into 
a product of two linear polynomials with real coefficients, instead it requires 
complex numbers + 1 = (x — 1) (x + i) . 

Therefore, in this chapter we shall use complex numbers for our scalars, 
including entries in vectors and matrices. That is, we are shifting from studying 
vector spaces over the real numbers to vector spaces over the complex numbers. 

Any real number is a complex number and in this chapter most of the 
examples use only real numbers. Nonetheless, the critical theorems require that 
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the scalars be complex so the first section is a quick review of complex numbers. 

In this book, our approach is to shift to this more general context of taking 
scalars to be complex only for the pragmatic reason that we must do so now 
in order to move forward. However, the idea of doing vector spaces by taking 
scalars from a structure other than the real numbers is an interesting and useful 
one. Delightful presentations that take this approach from the start are in 
[Halmos] and [Hoffman & Kunze]. 



1.1 Review of Factoring and Complex Numbers 

This subsection is a review only and we take the main results as known. 
For proofs, see [Birkhoff & MacLane] or [Ebbinghaus] . 

We consider a polynomial p(x) = c^x^ + • • • + Cix + Cq with leading 
coefficient Cn ^ 0. The degree of the polynomial is n if n ^ 1 . If n = then 
p is a constant polynomial p(x) — Cq. Constant polynomials that are not the 
zero polynomial, Co 7^ 0, have degree zero. We define the zero polynomial to 
have degree —00. 

Just as integers have a division operation — e.g., '4 goes 5 times into 21 with 
remainder T — so do polynomials. 

1.1 Theorem (Division Theorem for Polynomials) Let c(x) be a polynomial. If m(x) 
is a non-zero polynomial then there are quotient and remainder polynomials 
q{x) and r(x) such that 

c(x) = m(x) • q(x) + r(x) 
where the degree of t(x) is strictly less than the degree of m(x). 

1.2 Remark Defining the degree of the zero polynomicd to be —00, which most 
but not all authors do, allows the equation degree(fg) — degree(f) + degree(g) 
to hold for all polynomial functions f and g. 

The point of the integer division statement '4 goes 5 times into 21 with 
remainder 1 ' is that the remainder is less than 4 — while 4 goes 5 times, it does 
not go 6 times. In the same way, the point of the polynomial division statement 
is its final clause. 

1.3 Example If c(x] = 2x^ — 3x^ +4x and m(x) = x^ + 1 then q(x) = 2x — 3 and 
r(x) = 2x + 3. Note that r{x) has a lower degree than m(x]. 

1.4 Corollary The remainder when c(x) is divided by x — A is the constant 
polynomial r(x) — c(A). 
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Proof The remainder must be a constant polynomial because it is of degree less 
than the divisor x — A, To determine the constant, take ra(x) from the theorem 
to be X — A and substitute A for x to get c(A) = (A — A) • q(A) + r(x). QED 

If a divisor m(x) goes into a dividend c(x) evenly, meaning that r(x) is the 
zero polynomial, then m(x) is a factor of c(x). Any root of the factor (any 
A e M such that m(A) = 0) is a root of c(x) since c{A) = m(A) • q(A) = 0. 

1.5 Corollary If A is a root of the polynomial c(x) then x — A divides c(x) evenly, 
that is, X — A is a factor of c(x). 

Proof By the above corollary c(x) = (x — A) • q(x) + c(A). Since A is a root, 
c(A) = so x — A is a factor. QED 

Finding the roots and factors of a high-degree polynomial can be hard. 
But for second-degree polynomials we have the quadratic formula: the roots of 
ax^ -t- bx + c are 

-b + Vb^ -4qc , -b - Vb^ - 4ac 



(if the discriminant b^ — 4ac is negative then the polynomial has no real number 
roots). A polynomial that cannot be factored into two lower-degree polynomials 
with real number coefficients is irreducible over the reals. 

1.6 Theorem Any constant or linear polynomial is irreducible over the reals. A 
quadratic polynomial is irreducible over the reals if and only if its discriminant 
is negative. No cubic or higher-degree polynomial is irreducible over the reals. 

1.7 Corollary Any polynomial with real coefficients can be factored into linear 
and irreducible quadratic polynomials. That factorization is unique; any two 
factorizations have the same powers of the same factors. 

Note the analogy with the prime factorization of integers. In both cases, the 
uniqueness clause is very useful. 

1.8 Example Because of uniqueness we know, without multiplying them out, that 
(x + 3)^(x2 + 1]^ does not equal [x + 3)'^[x^ + x+ 1)^. 

1.9 Example By uniqueness, if c(x) = m(x)-q(x) then where c(x) = (x— 3]^(x+2)^ 
and ra(x) = (x — 3)(x + 2)^, we know that q(x] = (x — 3)(x + 2). 

While x^ + 1 has no real roots and so doesn't factor over the real numbers, 
if we imagine a root — traditionally denoted i so that + 1 =0 — then x^ + 1 
factors into a product of linears (x — i) (x + 1] . 

So we adjoin this root i to the reals and close the new system with respect 
to addition, multiplication, etc. (i.e., we also add 3 + i, and 2i, and 3 + 2i, etc., 
putting in all linear combinations of 1 and i). We then get a new structure, the 
complex numbers C. 
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In C we can factor (obviously, at least some) quadratics that would be 
irreducible if we were to stick to the real numbers. Surprisingly, in C we can 
not only factor + 1 and its close relatives, we can factor any quadratic. 

2 , / -b + \/b234ac. , -b - x/b^ -4aCs 

ax + bx + c = a • X • x 

^ 2a la ' 

1.10 Example The second degree polynomial x^ + x + 1 factors over the complex 
numbers into the product of two first degree polynomials. 

('^ 1 ) 1 ) = - + - - -Y"-^) 

1.11 Theorem (Fundamental Theorem of Algebra) Polynomials with complex coeffi- 
cients factor into linear polynomials with complex coefficients. The factorization 
is unique. 



1.2 Complex Representations 

Recall the definitions of the complex number addition 

(a + bi) + (c + di) = (a + c) + (b + d)i 

and multiplication. 

(a + bi)(c + di) = ac + adi + bci + bd(-l ) 
= (ac -bd) + (ad + bc)i 

2.1 Example For instance, (1 - 2i) + (5 + 4i) = 6 + 2i and (2 - 3i)(4 - 0.5i) = 
6.5- 13i. 

Handling scalar operations with those rules, all of the operations that we've 
covered for real vector spaces carry over unchanged. 

2.2 Example Matrix multiplication is the same, although the scalar arithmetic 
involves more bookkeeping. 

/l + li 2-Oi \ /l +0i 1 -Oi\ 

^ Al + U) • (1 + Oi) + [2- Oi) • (3i) (1 + li) • (1 - Oi) + (2 - Oi) • 

" 1^ (i).(l+0i) + (-2 + 3i)-(3i) (i).(l-0i) + {-2 + 3i)-{-i) J 

f ^+7i 1 -u\ 

~ 1^-9 - 5i 3 + 3i j 
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We shall also carry over unchanged from the previous chapters everything 
that we can. For instance, we shall call this 



/l + Oi\ 
+ Oi 

\^0 + Oiy 



/0 + 0i\ 
+ Oi 

\l +OiJ 



) 



the standard basis for C"^ as a vector space over C and again denote it £t 
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II Similarity 



We've defined two matrices H and H to be matrix equivalent if there are 
nonsingular P and Q such that H — PHQ. We were motivated by this diagram 
showing both H and H representing a map, h but with respect to different pairs 
of bases, B.D and B.D. 



id 



V. 



wrt B 



wrt D 



id 



-> W. 



wrt D 



We now consider the special case where the codomain equals the domain 
and in particular we add the requirement that the codomain's basis equals the 
domain's basis, so we are considering representations with respect to B, B and 
D,D. 

Vmrt B > ^wrt B 



id 



V, 



wrt D 



"1 

Vmri D 



In matrix terms, Repp j3(t) — Repg o(id) Repg ^(t) (Repg ofid)) 



-1 



II. 1 Definition and Examples 

1.1 Definition The matrices T and S are similar if there is a nonsingular P such 
that T = PSP-'. 

Since nonsingular matrices are square, T and S must be square and of the same 
size. Exercise 12 checks that similarity is an equivalence relation. 

1.2 Example Calculation with these two, 
gives that S is similar to this matrix. 

In) 

1.3 Example The only matrix similar to the zero matrix is itself: PZP^^ = PZ = Z. 
The identity matrix has the same property: PIP^^ = PP^^ — I. 
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Since matrix similarity is a special case of matrix equivalence, if two matrices 
are similar then they are matrix equivalent. What about the converse: must 
any two matrix equivalent square matrices be similar? No; the prior example 
shows that the similarity classes are different from the matrix equivalence classes 
because the matrix equivalence class of an identity consists of all nonsingular 
matrices of that size. Thus these two are matrix equivalent but not similar. 





So some matrix equivalence classes split into two or more similarity classes — 
similarity gives a finer partition than does equivalence. This pictures some 
matrix equivalence classes subdivided into similarity classes. 



To understand the similarity relation we shall study the similarity classes. 
We approach this question in the same way that we've studied both the row 
equivalence and matrix equivalence relations, by finding a canonical form for 
representatives* of the similarity classes, called Jordan form. With this canonical 
form, we can decide if two matrices are similar by checking whether they are in 
a class with the same representative. We've also seen with both row equivalence 
aind matrix equivalence that a canonical form gives us insight into the ways in 
which members of the same class are alike (e.g., two identically-sized matrices 
are matrix equivalent if and only if they have the same rank). 



Exercises 

1.4 For 



check that T = PSP-^ 
/ 1.5 Example 1.3 shows that the only matrix similar to a zero matrix is itself and 
that the only matrix similar to the identity is itself. 

(a) Show that the 1 x 1 matrix (2), also, is similar only to itself. 

(b) Is a matrix of the form cl for some scalar c similar only to itself? 

(c) Is a diagonal matrix similar only to itself? 
/ 1.6 Show that these matrices are not similar. 

















(i 


1 


i) 


(i 


1 


i) 




1 






1 





1.7 Consider the transformation t: — > described by i-7> x + 1 , x i-7> — 1 , 
and 1 i-> 3. 

(a) Find T = Repg gft) where B = (x^,x, 1). 
* More information on representatives is in the appendix. 
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(b) Find S = Repo_D{t) where D = (1 , 1 + x, 1 + x + x^). 

(c) Find the matrix P such that T = PSP"' . 

/ 1.8 Exhibit an nontrivial similarity relationship in this way: let t: ^ act by 

and pick two bases, and represent t with respect to then T — Repg 3(1) and 
S = Repp o(t]- Then compute the P and P^' to change bases from B to D and 
back again. 

1.9 Explain Example 1.3 in terms of maps. 
/ 1.10 [Halmos] Are there two matrices A and B that are similar while and B^ are 
not similar? 

/ 1.11 Prove that if two matrices are similar and one is invertible then so is the other. 
/ 1.12 Show that similarity is an equivalence relation. 

1.13 Consider a matrix representing, with respect to some B,B, reflection across 
the X-axis in K^. Consider also a matrix representing, with respect to some D, D, 
reflection across the y-axis. Must they be similar? 

1.14 Prove that similarity preserves determinants and rank. Does the converse hold? 

1.15 Is there a matrix equivalence class with only one matrix similarity class inside? 
One with infinitely many similarity classes? 

1.16 Can two different diagonal matrices be in the same similarity class? 

/ 1.17 Prove that if two matrices are similar then their k-th powers are similar when 
k > 0. What if k ^ 0? 

/ 1.18 Let p(x] be the polynomial Cnx'^ + • • • + Cix + Cq. Show that if T is similar to 
S then p(T) — c^J^ H + CiT + CqI is similar to p(S) = CnS*^ H + CiS + CqI. 

1.19 List all of the matrix equivalence classes of 1 xl matrices. Also list the similarity 
classes, and describe which similarity classes are contained inside of each matrix 
equivalence class. 

1.20 Does similarity preserve sums? 

1.21 Show that if T — AI and N are similar matrices then T and N + AI are also 
similar. 



11.2 Diagonalizability 

The prior subsection shows that although similar matrices are necessarily matrix 
equivalent, the converse does not hold. Some matrix equivalence classes break 
into two or more similarity classes; for instance, the nonsingular 2x2 matrices 
form one matrix equivalence class but more than one similarity class. 

Thus we cannot use the canonical form for matrix equivalence, a block 
partial-identity matrix, as a canonical form for matrix similarity. The diagram 
below illustrates. The stars are similarity class representatives. Each dashed-line 
similarity class subdivision has one star but each solid-curve matrix equivalence 
class division has only one partial identity matrix. 
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To develop a canonical form for representatives of the similarity classes we 
naturally build on previous work. This means first that the partial identity 
matrices should represent the similarity classes into which they fall. Beyond 
that, the representatives should be as simple as possible and the partial identities 
are simple in that they consist mostly of zero entries. The simplest extension of 
the partial identity form is the diagonal form. 

2.1 Definition A transformation is diagonalizable if it has a diagonal represen- 
tation with respect to the same basis for the codomain as for the domain. A 
diagonalizable matrix is one that is similar to a diagonal matrix: T is diagonal- 
izable if there is a nonsingular P such that PTP^^ is diagonal. 



2.2 Example The matrix 



is diagonalizable. 





2.3 Example This matrix is not diagonalizable 

because it is not the zero matrix but its square is the zero matrix. The fact that 
N is not the zero matrix means that it cannot be similar to the zero matrix, 
by Example 1.3. So if N is similar to a diagonal matrix D then D has at least 
one nonzero entry on its diagonal. The fact that N's square is the zero matrix 
means that for any map n that N represents, the composition n o n is the zero 
map. The only matrix representing the zero map is the zero matrix and thus 
would have to be the zero matrix. But cannot be the zero matrix because 
the square of a diagonal matrix is the diagonal matrix whose entries are the 
squares of the entries from the starting matrix, and D is not the zero matrix. 

That example shows that a diagonal form will not suffice as a canonical 
form — we cannot find a diagonal matrix in each matrix similarity class. However, 
the canonical form that we are developing has the property that if a matrix can 
be diagonalized then the diagonal matrix is the canoniccil representative of its 
similarity class. 
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2.4 Lemma A transformation t is diagonalizable if and only if there is a basis 
B = (pi , . . . , pn) and scalars Ai , . . . , An such that t(pi) = A^pi for each 1. 



Proof Consider a diagonal representation matrix. 



RepB,B(t) 



RepB(t(pi]) 



RepeWPn)) 



V 



Vo 



Consider the representation of a member of this basis with respect to the basis 
RepgdSi). The product of the diagonal matrix and the representation vector 



RepB(t(pO) 



has the stated action. 

2.5 Example To diagonalize 



Vo 



QED 



we take it as the representation of a transformation with respect to the standard 
basis T = Repg^ £^ (t) and we look for a basis B = (pi , Pa) such that 



Repj 



(t) = 



'Ai 

\2. 



that is, such that t(Pi) = Ai Pi and t(p2) = A2P2. 



'3 2' 
1 , 



Pi =Ai • pi 



'3 2' 
1, 



p2 = A2 • p2 



We are looking for scalars x such that this equation 



'3 2' 
,0 1, 



'bi' 
b2. 



bi' 

,b2, 



has solutions bi and b2, which are not both zero (the zero vector is not the 
member of any basis). That's a linear system. 



(3-x)-bi+ 2-b2=0 
(1 -x) •b2=0 



(*) 



Focus first on the bottom equation. The two numbers multiply to give zero only 
if at least one of them is zero so there are two cases, b2 = or x = 1 . In the 



Section II. Similarity 



363 




b2 = case, the first equation gives that either bj = or x = 3. Since we've 
disallowed the case of both bj = and b2 — 0, we are left with Ai = 3. Then 
the first equation in (*) is • b] + 2 • bi = and so associated with A] = 3 are 
vectors with a second component of zero and a first component that is free. 

= 3- 

Choose any nonzero bi to have a first basis vector. 

The second case for the bottom equation of (*) is A2 = 1 . The first equation in 
(*) is then 2 • bi + 2 • bi =0 and so associated with 1 are vectors such that their 
second component is the negative of their first. 





Choose a nonzero one of these to have a second basis vector. 



P2 = 



-1 



Now drawing the similarity diagram 



^wrt £2 



id 



?2 
wrt 



wrt £ 2 
id 



^wrt B 



and noting that the matrix Repg (id) is easy gives us this diagonalization. 





In the next subsection, we will expand on that example by considering more 
closely the property of Lemma 2.4. This includes seeing another way, the way 
that we will routinely use, to find the A's. 



Exercises 



/ 2.6 Repeat Example 2.5 for the matrix from Example 2.2. 
2.7 Diagonalize these upper triangular matrices. 
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/ 2.8 What form do the powers of a diagonal matrix have? 

2.9 Give two same-sized diagonal matrices that are not similar. Must any two 
different diagonal matrices come from different similarity classes? 

2.10 Give a nonsingular diagonal matrix. Can a diagonal matrix ever be singular? 
/ 2.11 Show that the inverse of a diagonal matrix is the diagonal of the the inverses, if 

no element on that diagonal is zero. What happens when a diagonal entry is zero? 
2.12 The equation ending Example 2.5 

'1 /3 2\ /I 1\ f3 0^ 



is a bit jarring because for P we must take the first matrix, which is shown as an 
inverse, and for P^' we take the inverse of the first matrix, so that the two —1 
powers cancel and this matrix is shown without a superscript —1 . 
(a) Check that this nicer-appearing equation holds. 



V vo -V VO V VO -1 

(b) Is the previous item a coincidence? Or can we always switch the P and the 

p-l 7 

2.13 Show that the P used to diagonalize in Example 2.5 is not unique. 

2.14 Find a formula for the powers of this matrix Hint: see Exercise 8. 

-3 r 

/ 2.15 Diagonalize these. 

2.16 We can ask how diagonalization interacts with the matrix operations. Assume 
that t, s : V ^> V are each diagonalizable. Is ct diagonalizable for all scalars c? 
What about t + s? t o s? 
/ 2.17 Show that matrices of this form are not diagonalizable. 

2.18 Show that each of these is diagonalizable. 

G ^) (y z) x,y,z scalars 



11.3 Eigenvalues and Eigenvectors 

In this subsection we will focus on the property of Lemma 2.4. 

3.1 Definition A transformation t: V V has a scalar eigenvalue A if there is a 
nonzero eigenvector C G V such that t(C) = A • C- 

("Eigen" is German for "characteristic of" or "peculiar to." Some authors call 
these characteristic values and vectors. No one calls them "peculiar.") 
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3.2 Remark This definition requires that the eigenvector be non-0. Some authors 
allow as an eigenvector for A as long as there are also non-0 vectors associated 
with A. Neither style of definition is clearly better; both involve small tradeoffs. 
In both styles the key point is to not allow a case where A is such that t(v) = Av 
for only the single vector v = 0. 

Also, note that A could be 0. The issue is whether C could be 0. 

3.3 Example The projection map 











vJ 





has an eigenvalue of 1 associated with any eigenvector of the form 

y 

\V 

where x and y are scalars that are not both zero. On the other hand, 2 is not 
an eigenvalue of n since no non-0 vector is doubled. 

3.4 Example The only transformation on the trivial space {0 } is i-> 0. This 
map has no eigenvalues because there are no non-0 vectors v mapped to a scalar 
multiple A • V of themselves. 

3.5 Example Consider the homomorphism t: Ti given by Co + Cix i— > 
(co + Ci ) + (co + Ci )x. While the codomain of t is two-dimensional, its range 
is one- dimensional M{t) = {c + cx | c G C}. Application of t to a vector in that 
range will simply rescale the vector c + cx i-^- (2c) + (2c)x. That is, t has an 
eigenvalue of 2 associated with eigenvectors of the form c + cx where c 7^ 0. 

This map also has an eigenvalue of associated with eigenvectors of the form 
c — cx where c ^ 0. 

3.6 Definition A square matrix T has a scalar eigenvalue A associated with the 
nonzero eigenvector C if TC = A • C ■ 

Although this extension from maps to matrices is natural, we need to make 
one observation. Eigenvalues of a map are also the eigenvalues of matrices 
representing that map and so similar matrices have the same eigenvalues. But the 
eigenvectors can different — similar matrices need not have the same eigenvectors. 

3.7 Example Consider again the transformation t: Ti — > Ti from Example 3.5 
given by Co + Ci X i-^ (co + Ci ) + (co + Ci )x. One of its eigenvalues is 2, associated 
with the eigenvectors c + cx where c 7^ 0. If we represent t with respect to 
B = (1 -Hlx, 1 -1x) 



T = RepB,B(t) = 
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then 2 is an eigenvalue of the matrix T, associated with these eigenvectors. 

<::)i(o2)(:)^(y'^<?)i---»^»' 

On the other hand, if we represent t with respect to D = (2 + Ix, 1 + Ox) 

S = RepD^D(t) = 
then the eigenvectors associated with the eigenvalue 2 are these. 

<::)i(j-!)(::)=5:)'='(:)i— ^« 

3.8 Remark Here is an informal description of the reason for the difference. 
The underlying transformation doubles the eigenvectors v i-^- 2 • v. But when 
the matrix representing the transformation is T = Repg g (t) then the matrix 
"assumes" that column vectors are representations with respect to B. In contrast, 
S = RepQ Q(t] "assumes" that column vectors are representations with respect 
to D. So the column vector representations that get doubled by each matrix are 
different. 

The next example shows the basic tool for finding eigenvectors and eigenval- 
ues. 

3.9 Example If 




then to find the scalars x such that TC = xC for nonzero eigenvectors Cj bring 
everything to the left-hand side 




and factor (T — xI)C — 0. (Note that it says T — xl. The expression T — x doesn't 
make sense because T is a matrix while x is a scalar.) This homogeneous linear 
system 

/l -X 2 1 \ /zA /0\ 

2 O x -2 U2 = 
V-1 2 S-x)\z,) \0j 




has a nonzero solution z if and only if the matrix is singular. We can determine 
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when that happens. 



= |T-xI| 

1 -X 2 1 
2 - X -2 
-1 2 3-x 

= x^ - 4x^ + 4x 
= x(x-2)2 



The eigenvalues are Ai = and A2 = 2. To find the associated eigenvectors plug 
in each eigenvalue. Plugging in Ai =0 gives 




-1^ 





























( a 















for Q 7^ (a is non-0 because eigenvectors must be non-0). Plugging in A2 = 2 
gives 



Z2 = 























Us 




lb 



with b 7^ 0. 
3.10 Example If 



,0 3, 



(here tt is not a projection map, it is the number 3.14. . .) then 

= (x-7r)(x-3) 



7t — X 1 

3-X 



so S has eigenvalues of Ai — n and A2 — 3. To find associated eigenvectors, first 
plug in Ai for X 



7T — 7T 1 ' 
, 3-71; 



Z2, 



for a scalar a 7^ 0. Then plug in A2 



'7T-3 1 ^ 

, 3-3, 



.Z2, 



Zl 



a ' 
0, 



-b/(7t-3)^ 

b 



where b 7^ 0. 
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3.11 Definition The characteristic polynomial of a square matrix T is the 
determinant |T — xl| where x is a variable. The characteristic equation is 
|T — xl| = 0. The characteristic polynomial of a transformation t is the 
characteristic polynomial of any matrix representatation Repg -g(t). 

Exercise 32 checks that the characteristic polynomial of a transformation is 
well-defined, that is, that the characteristic polynomial is the same no matter 
which basis B we use for the representation. 

3.12 Lemma A linear transformation on a nontrivial vector space has at least 
one eigenvalue. 

Proof Any root of the characteristic polynomial is an eigenvalue. Over the 
complex numbers, any polynomial of degree one or greater has a root. QED 

3.13 Remark This result is the reason that in this chapter we've changed to using 
scalars that are complex. 

3.14 Definition The eigenspace of a transformation t associated with the 
eigenvalue A is Va = {C | t(C) — AC }. The eigenspace of a matrix is analogous. 



3.15 Lemma An eigenspace is a subspace. 



Proof An eigenspace is nonempty because it contains the zero vector since for 
any linear transformation t(0) = 0, which equals AO. Thus we need only check 
closure of linear combinations. Take Ci , • • • > Cn G Va and verify 

t(ClCl + C2C2 + --- + CnCn) =Cit(Ci) + --- + Cnt(Cn] 

= Ci ACi H h CnACn 

= A(CiCl H hCnCn) 

that the combination is also in Va (despite that the zero vector isn't an eigen- 
vector, the second equality holds even if some Ci is since t(0) = A • = 0). 
QED 

3.16 Example In Example 3.10 the eigenspace associated with the eigenvalue n 
and the eigenspace associated with the eigenvalue 3 are these. 

3.17 Example In Example 3.9 these are the eigenspaces associated with the 
eigenvalues and 2. 




Vo -a ae C}, V2 = { b e C}, 
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The characteristic equation in Example 3.9 is = x(x — 2)^ so in some sense 
2 is an eigenvalue twice. However there are not twice as many eigenvectors in 
that the dimension of the associated eigenspace V2 is one, not two. The next 
example is a case where a number is a double root of the characteristic equation 
and the dimension of the associated eigenspace is two. 

3.18 Example With respect to the standard bases, this matrix 



Its eigenspace associated with the eigenvalue and its eigenspace associated 
with the eigenvalue 1 are easy to find. 



By Lemma 3.15 if two eigenvectors vi and vz are associated with the same 
eigenvalue then a linear combination of those two is also an eigenvector, associated 
with the same eigenvalue. Thus, referring to the prior example, this sum of two 
members of Vj 



yields another member of Vi . 

The next result speaks to the situation where the vectors come from different 
eigenspaces. 

3.19 Theorem For any set of distinct eigenvalues of a map or matrix, a set of 
associated eigenvectors, one per eigenvalue, is linearly independent. 

Proof We will use induction on the number of eigenvalues. If there is no 
eigenvalue then the set of associated vectors is empty, and is linearly independent. 
If there is only one eigenvalue then the set of associated eigenvectors is a singleton 
set with a non-0 member, and so is linearly independent. 

For induction assume that the theorem is true for any set of k distinct 
eigenvalues. Consider distinct eigenvalues Ai,...,Aic+i and let vi , . . . ,vic+i 
be associated eigenvectors. Suppose that = CiVi + • • • + c^Vk + Cic+iv^+i . 
Derive two equations from that, the first by multiplying A^+i on both sides 




represents projection. 
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= C]X]c+^v^ +■ ■ • + Ck+i Ai4+iV|^_|_i and the second by applying the map to both 

sides = Cit(vi)H h Ck+it(v,^+i ) =CiAiVi H + c^+i A^+iVk+i (applying 

the matrix gives the same result). Subtract the second equation from the first 

= ci (Ak+i - Ai )vi H h Cic(Ak+i - Ak)vk + Ck+i (Ak+i - Ak+i )vk+i 

so that the v^+i term vanishes. Then the induction hypothesis gives that 
Ci (Ak+i — Ai ) =0, . . . , Ck(Ak+i — Ak) — 0. All of the eigenvalues are distinct 
so Ci , . . . , Ck are all 0. With that, Ck+i must be because we are left with the 
equation — Ck+iVk+i . QED 

3.20 Example The eigenvalues of 




are distinct: Ai = 1, Ai = 2, and A3 = 3. A set of associated eigenvectors 




is linearly independent. 

3.21 Corollary An nxn matrix with n distinct eigenvalues is diagonalizable. 

Proof Form a basis of eigenvectors. Apply Lemma 2.4. QED 
Exercises 

3.22 For each, find the characteristic polynomial and the eigenvalues. 

c: :i) G « (° c :) 

/ 3.23 For each matrix, find the characteristic equation, and the eigenvalues and 
associated eigenvectors. 

") J) (J 

3.24 Find the characteristic equation, and the eigenvalues and associated eigenvectors 
for this matrix. Hint. The eigenvalues are complex. 

'-2 -r 

5 2y 

3.25 Find the characteristic polynomial, the eigenvalues, and the associated eigen- 
vectors of this matrix. 

(0 1) 

\0 1/ 

/ 3.26 For each matrix, find the characteristic equation, and the eigenvalues and 
associated eigenvectors. 



Section II. Similarity 



371 




Qo + aix+ Q2X^ 1-^ (5ao + 6a, +2a2) — {a, +8q2)x+ (qq — 2a2)x^. 

Find its eigenvalues and the associated eigenvectors. 

3.28 Find the eigenvalues and eigenvectors of this map t: M2 M.2- 

b\ 2c a + c^ 

,c dj Vb-2c d 

/ 3.29 Find the eigenvalues and associated eigenvectors of the differentiation operator 
d/dx: Vi^Vi. 

3.30 Prove that the eigenvalues of a triangular matrix (upper or lower triangular) 
are the entries on the diagonal. 

/ 3.31 Find the formula for the characteristic polynomial of a 2x2 matrix. 

3.32 Prove that the characteristic polynomial of a transformation is well-defined. 

3.33 Prove or disprove: if all the eigenvalues of a matrix are then it must be the 
zero matrix. 

/ 3.34 (a) Show that any non-0 vector in any nontrivial vector space can be a 
eigenvector. That is, given a v 7^ from a nontrivial V, show that there is a 
transformation t: V ^> V having a scalar eigenvalue A e R such that v e V^. 
(b) What if we are given a scalar A? Can any non-0 member of any nontrivial 
vector space be an eigenvector associated with A? 

/ 3.35 Suppose that t: V ^ V and T = Repg ^(t). Prove that the eigenvectors of T 
associated with A are the non-0 vectors in the kernel of the map represented (with 
respect to the same bases) by T — AI. 

3.36 Prove that if a, ... , d are all integers and a + b = c + d then 

Q b 
c d 

has integral eigenvalues, namely a + b and a — c. 
/ 3.37 Prove that if T is nonsingular and has eigenvalues A],...,An then T^' has 

eigenvalues 1 /A] , . . . , 1 /An- Is the converse true? 
/ 3.38 Suppose that T is nxn and c, d are scalars. 

(a) Prove that if T has the eigenvalue A with an associated eigenvector v then v is 
an eigenvector of cT + dl associated with eigenvalue cA + d. 

(b) Prove that if T is diagonalizable then so is cT + dl. 

/ 3.39 Show that A is an eigenvalue of T if and only if the map represented by T — AI 
is not an isomorphism. 

3.40 [Strang 80] 

(a) Show that if A is an eigenvalue of A then A'^ is an eigenvalue of A'^. 

(b) What is wrong with this proof generalizing that? "If A is an eigenvalue of A 
and [I is an eigenvalue for B, then AiJ- is an eigenvalue for AB, for, if Ax = Ax and 
Bx = \ix then ABx = Afxx — yiAx = (xAx"? 

3.41 Do matrix equivalent matrices have the same eigenvalues? 

3.42 Show that a square matrix with real entries and an odd number of rows has at 
least one real eigenvalue. 
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3.43 Diagonalize. 

/-I 2 2\ 
2 2 2 

\-3 -6 -6/ 

3.44 Suppose that P is a nonsingular n x n matrix. Show that the similarity 
transformation map tp: M^xn ^Vt^xn sending T i-^ PTP^' is an isomorphism. 

? 3.45 [Math. Mag., Nov. 1967] Show that if A is an n square matrix and each row 
(column) sums to c then c is a characteristic root of A. ("Characteristic root" is a 
synonym for eigenvalue.) 
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III Nilpotence 



The goal of this chapter is to show that every square matrix is similar to one 
that is a sum of two kinds of simple matrices. The prior section focused on the 
first simple kind, diagonal matrices. We now consider the other. 



Because a linear transformation t: V ^ V has the same domain as codomain, 
we can find the composition of t with itself = t o t, and = t o t o t, etc.* 



Note that the superscript power notation t' for iterates of the transformations 
dovetails with the notation that we've used for their square matrix representations 
because if Repg ^(t] — T then Repg ^[V) —V. 

1.1 Example For the derivative map d/dx: Ts ^3 given by 

a + bx + cx^ + dx^ b + 2cx + 3dx^ 
the second power is the second derivative 



lll.l Self-Composition 




a + bx + cx^ + dx' 



3 dVdx^ 



2c + 6dx 



the third power is the third derivative 



a + bx + cx^ + dx- 



3 d^/dx^ 



6d 



and any higher power is the zero map. 

1.2 Example This transformation of the space of 2x2 matrices 




has this second power 




* More information on function iteration is in the appendix. 
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and this third power. 



a b\ /b a' 
c dj ' ^ I , 



After that, f* = and = t^, etc. 

1.3 Example Consider the shift transformation t: 

X 

We have that 




/x 

V 



yl^|xl.^(o|^|0 



so the range spaces descend to the trivial subspace. 



a I I a,b e C} ^(t^) ={ 



I I c e C} ^[t^] 



} 

W 



These examples suggest that after some number of iterations the map settles 
down. 

1.4 Lemma For any transformation t: V — >■ V, the range spaces of the powers 
form a descending chain 

and the null spaces form an ascending chain. 

{0}c^(t) c^(t2) c ... 

Further, there is a k such that for powers less than k the subsets are proper so that 
if i < k then ^[V ) D ^(t'+i ) and ^(t' ) C ) while for higher powers the 

sets are equal, that is, if j ^ k then ^(V) = ^(t'+i ) and ^[V) = ^(t'+i )). 

Proof First recall that for any map the dimension of its range space plus 
the dimension of its null space equals the dimension of its domain. So if the 
dimensions of the range spaces shrink then the dimensions of the null spaces must 
grow. We will do the range space half here and leave the rest for Exercise 14. 

We start by showing that the range spaces form a chain. If w e ^(t'+^ ), so 
that w = t'+i (v), thenw = tj(t(v)). Thuswe^(tj). 

Next we verify the "further" property: while the subsets in the chain of 
range spaces may be proper for a while, from some power k onward the range 
spaces are equal. We first show that if any pair of adjacent range spaces in 
the chain are equal M[X^) — ^(t'^+^) then all subsequent ones are also equal 
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^(t^+i ) = ^[t^+^], etc. This holds because t: ^(t'^+i ) ^ ^(t^+^) is the same 
map, with the same domain, as t: 3?[t^) — >• M[t^^^ ) and it therefore has the 
same range M[t^^^ ) — M[t^^^) (it holds for all higher powers by induction). So 
if the chain of range spaces ever stops strictly decreasing then from that point 
onward it is stable. 

We end by showing that the chain must eventually stop decreasing. Each 
range space is a subspace of the one before it. For it to be a proper subspace it 
must be of strictly lower dimension (see Exercise 12). These spaces are finite- 
dimensional and so the chain can fall for only finitely-many steps, that is, the 
power k is at most the dimension of V. QED 

1.5 Example The derivative map a + bx + cx^ + dx^ i^^^ b + lex + 3dx^ on 
has this chain of range spaces 

^(t°) = 0^3 D ^(t^ ) = D ^(t^) = O'l D ^[t^] ^Vo D ^[f^] = {0} 

(all later elements of the chain are the trivial space). And it has this chain of 
null spaces 

^(t°) = {0} c ,y^[V ) = To c .yy[t^) = Ti c ,yV[t^) = c ^(f*) = T3 
(later elements are the entire space). 

1.6 Example Let t: T2 — >■ T'2 be the map Cq + Cix + C2X^ i-> 2co + C2X. As the 
lemma describes, on iteration the range space shrinks 

^(t°)=T2 ^(t) ={a + bx I Q,b e C} ^(t^) ={a I ae C} 

cind then stabilizes M{t^] = ^(t^) — ■ ■ ■ while the null space grows 

^(t°)={0} ^(t) ={cx I c e C} ^(t^] ={cx + d I c,de C} 

and then stabilizes ^(t^) — ^[t^] = • • • . 

1.7 Example The transformation n: projecting onto the first two coor- 
dinates 



has D ^(tt) = ^[n^) = • • • and {0} c ,yy[n) = y^[n^) = ■■■ where this is 
the range space and the null space. 



1.8 Definition Let t be a transformation on an n-dimensional space. The gener- 
alized range space (or the closure of the range space) is (t) = ^(t"^) The 
generalized null space (or the closure of the null space) is ^ooW = yy[t^). 
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This graph illustrates. The horizontal axis gives the power j of a transfor- 
mation. The vertical axis gives the dimension of the range space of V as the 
distance above zero, and thus also shows the dimension of the null space because 
the two add to the dimension n of the domain. 



T T 

nullity (ti ) 



1 



rank(t' 



dim(^(t)) 



diml^i'oott)) 



1 2 



On iteration the rank falls and the nullity rises until there is some k such 
that the map reaches a steady state ^[t^) = ^(t'^+' ) = ^oo(t) and ,yr[t^) = 
o/K{t'^+' ) — ,yVoo[^]- This must happen by the n-th iterate. 

Exercises 

/ 1.9 Give the chains of range spaces and null spaces for the zero and identity trans- 
formations. 

/ 1.10 For each map, give the chain of range spaces and the chain of null spaces, and 
the generalized range space and the generalized null space. 



(a) to: 

(b) t, : 



(c) t2: 

(d) ts: 



m>2 



TO3 



^2, 

. m)2 



hx + cx^ n> b + cx^ 



a 
b 

bx + cx^ b + cx + ax 



1.11 Prove that function composition is associative (t o t) o t = t o (t o t] and so we 
can write without specifying a grouping. 

1.12 Check that a subspace must be of dimension less than or equal to the dimension 
of its superspace. Check that if the subspace is proper (the subspace does not equal 
the superspace) then the dimension is strictly less. (This is used in the proof of 
Lemma 1.4-) 

/ 1.13 Prove that the generalized range space .^oo(t) is the entire space, and the 
generalized null space ^4^(1) is trivial, if the transformation t is nonsingular. Is 
this 'only if also? 
1.14 Verify the null space half of Lemma 1.4. 
/ 1.15 Give an example of a transformation on a three dimensional space whose range 
has dimension two. What is its null space? Iterate your example until the range 
space and null space stabilize. 
1.16 Show that the range space and null space of a linear transformation need not 
be disjoint. Are they ever disjoint? 
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III. 2 Strings 

This subsection requires material from the optional Direct Sum subsection. 

The prior subsection shows that as j increases, the dimensions of the ,^(t')'s 
fall while the dimensions of the ^[V)'s rise, in such a way that this rank and 
nullity split between them the dimension of V. Can we say more; do the two 
split a basis — is V = ^{V) ® ^(t')? 

The answer is yes for the smallest power j — since V — ^(t°) yy(t^) = 
V © {0}. The answer is also yes at the other extreme. 

2.1 Lemma For any linear t: V V the function t: ^oo(t) •^oo(t) is one-to- 
one. 

Proof Let the dimension of V be n. Because M[t^] — ^(t"^+^), the map 
t: ^oo(t) — ^-^oolt) is a dimension-preserving homomorphism. Therefore, by 
Theorem Two. II. 2. 21 it is one-to-one. QED 

2.2 Corollary Where t: V V is a linear transformation, the space is the direct 
sumV = ^oo(t)®^(t). That is, both (1) dim(V) = dim(^oo(t))+dim(^(t)) 
and (2) ^oo(t)n^(t) ={0}. 

Proof Let the dimension of V be n. We will verify the second sentence, which 
is equivalent to the first. Clause (1) is true because any transformation satisfies 
that its rank plus its nullity equals the dimension of the space, and in particular 
this holds for the transformation t'^. Thus we need only verify clause (2). 

Assume that v e ^oo(t) H ,yVoo[t] to prove that v = 0. Because v is in 
the generalized null space, t^(v) — 0. On the other hand, by the lemma 
t: ^oo(t) — ^ -^oolt) is one-to-one, and a composition of one-to-one maps is one- 
to-one, so t^: ^oo(t] — > .^oo(t) is one-to-one. Only is sent by a one-to-one 
linear map to so the fact that t^(v] — implies that v = 0. QED 

2.3 Remark Technically there is a difference between the map t: V ^ V and the 
map on the subspace t: ^oo(t) ^oo(t) if the generalized range space is not 
equal to V, because the domains are different. The second is the restriction* of 
the first to ^oo(t). 

For powers between j = and ] — n, the space V might not be the direct 
sum of and The next example shows that the two can have a 

nontrivial intersection. 

2.4 Example Consider the transformation of defined by this action on the 
elements of the standard basis. 



* More information on map restrictions is in the appendix. 
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This is a shift map. 



Another way to depict this map's action is with a string. 

ei !->■ 62 H' 

The vector 

is in both the range space and null space. 

2.5 Example A map ft: — > whose action on £4 is given by the string 

ei !-> 62 63 i-> 64 i-> 

has ^(n) n._yf (n) equal to the span [{64}], has ^(n^) n^(n^) = [{63, 64}], and 
has ^(n^) n c/K(n^) = [{64}]. The matrix representation is all zeros except for 
some subdiagonal ones. 

/O 0\ 
10 
10 
yo 1 0/ 

2.6 Example Transformations can act via more than one string. A transformation 
t acting on a basis B — (pi , . . . , ps) by 

Pi ^ (32 ^ (33 ^ 
(34 (35 

is represented by a matrix that is all zeros except for blocks of subdiagonal ones 



N = Rep 



Rep 



B,Bl 



fo 











0\ 


1 

















1 


























Vo 








1 


0) 



(the lines just visually organize the blocks). 

In those examples all vectors are eventually transformed to zero. 

2.7 Definition A nilpotent transformation is one with a power that is the zero 
map. A nilpotent matrix is one with a power that is the zero matrix. In either 
case, the least such power is the index of nilpotency. 



2.8 Example In Example 2.4 the index of nilpotency is two. In Example 2.5 it is 
four. In Example 2.6 it is three. 
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2.9 Example The differentiation map d/dx: J'2 ^ '^2 is nilpotent of index three 
since the third derivative of any quadratic polynomial is zero. This map's action 
is described by the string 2x i-^- 2 and taking the basis B — (x^,2x, 2) 
gives this representation. 

/O 
RepB B(d/dx) =1 
\0 1 

Not all nilpotent matrices are all zeros except for blocks of subdiagonal ones. 

2.10 Example With the matrix N from Example 2.5, and this four-vector basis 














2 


1 





1 


' 1 


1 


' 


\V 




V) 


VV 



a change of basis operation produces this representation with respect to D, D. 



/I 





1 






(0 

















1 




/-I 





1 







2 


1 







1 
















2 


1 





-3 


-2 


5 





1 


1 


1 










1 










1 


1 


1 





-2 


-1 


3 

















\0 





1 


0) 












1/ 


V 2 


1 


-2 


V 



The new matrix is nilpotent; it's fourth power is the zero matrix. We could 
verify this with a tedious computation, or we can observe that it is nilpotent 
since it is similar to the nilpotent matrix N'*. 



(Pl\jp-i )4 ^ pi\jp-i . PNp-i . PNP"^ • PNP"' = PN^^P"^ 

The goal of this subsection is to show that the prior example is prototypical 
in that every nilpotent matrix is similar to one that is all zeros except for blocks 
of subdiagonal ones. 

2.11 Definition Let t be a nilpotent transformation on V. A t-string of length 
k generated 6y v e V is a sequence (v, t(v), . . . ,t'^^^ (v)). A t-string basis is a 
basis that is a concatenation of t-strings. 

2.12 Example Consider differentiation d/dx: — > T'a- The sequence (x^, 2x, 2, 0) 
is a d/dx-string of length 4. The sequence (x^, 2x, 2) is a d/ dx-string of length 3 
that is a basis for J'2- 

Note that the strings cannot form a basis under concatenation if they are 
not disjoint because a basis cannot have a repeated vector. 

2.13 Example In Example 2.6, we can concatenate the t-strings (pi, Pi) Ps) and 
(P4, ps), of length three and two, to make a basis for the domain of t. 
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2.14 Lemma If a space has a basis of t-strings then the longest string in that 
basis has length equal to the index of nilpotency of t. 



Proof Assume the space has a basis of t-strings and that t's index of nilpotency 
is k. We cannot have that the longest string in that basis is longer than t's index 
of nilpotency: t'^ sends any vector, including the vector starting the longest 
string, to 0. So suppose instead that the space has a t-string basis B where all of 
the strings are shorter than length k. Because t has index k, there is a vector v 
such that t'^^^ (v) ^ 0. Represent v as a linear combination of elements from B 
and apply t^^^ . We are supposing that t'^^^ maps each element of B to 0, and 
therefore each summand in the linear combination to 0, but also that it does 
not map v to 0. That is impossible. QED 

We shall show that every nilpotent map has an associated string basis. 

We first see main idea of the argument by considering an example. If we 
want to construct a counterexample, a nilpotent map without an associated 



disjoint string basis, we would something like the map t: 
action. 



with this 



ei 

62 



631-^0 



64 !-> 65 l-> 



Rep 



£5,£; 



/O 0\ 



110 



Vo 1 oy 



The action on the basis shows that this map is nilpotent but this is not a disjoint 
string basis, because the first two of the three strings aren't disjoint. However, 
the fact that this basis isn't disjoint doesn't mean there is no disjoint string 
basis. To produce a such a basis for this map we will first find the number and 
lengths of its strings. 

Since t's index of nilpotency is two. Lemma 2.14 says that at least one string 
in the basis has length two. There are five basis elements so if there is a disjoint 
string basis then the map must act on it in one of these two ways. 



^^ ^ $2 ^ $^ ^ $2 ^ 

^3 ^ $4 ^ ^3^0 

Ps ^ 



Now, the key point. A transformation with the left-hand action has a null space 
of dimension three since that's how many basis vectors are mapped to zero. A 
transformation with the right-hand action has a null space of dimension four. 
Thus, using the matrix representation above, we can determine which of the two 
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possible shapes is right by calculating t's null space. 



—X 

z 


V W 



x,z,r e C} 



It is three-dimensional, meaning that this t has the left-hand action, since in 
the right-hand action the number of basis vectors mapped to zero is 4. 
To produce a string basis first pick pi and P4 from ^(t) n ,yV[t). 



132 = 



/0\ 

1 




134 = 



/0\ 






VV 



(Other choices are possible, just be sure that the set {p2) P4} is linearly inde- 
pendent.) For pick a vector from jy(t) that is not in the span of {pa, ^4}. 

/ 1\ 

-1 




V oj 

Finally, take (3i and Pa such that t((3i ) = pa and t(p3) — P4. 



(3i = 



/0\ 
1 






P3 = 






1 



Therefore, we have a string basis B = (Pi , . . . , ps) and with respect to that basis 
the matrix of t has blocks of subdiagonal 1 's. 



Rep 



B,B 



[tl = 



/o 











o\ 


1 



































1 








\o 











0) 



2.15 Theorem Any nilpotent transformation t is associated with a t-string basis. 
While the basis is not unique, the number and the length of the strings is 
determined by t. 
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This illustrates the proof, which describes three kinds of basis vectors (shown 
in squares if they are in the null space and in circles if they are not). 



(D 








(D 






...^® 


(D 








@ 








@ 


I-)- 


6 





[T] 

[T] 



Proof Fix a vector space V. We will argue by induction on the index of 
nilpotency. If the map t: V ^ V has index of nilpotency 1 then it is the zero 
map and any basis is a string basis pi i-^ 0, . . . , n- 0. 

For the inductive step, assume that the theorem holds for any transformation 
t: V ^ V with an index of nilpotency between 1 and k — 1 (with k > 1) and 
consider the index k case. 

Observe that the restriction of t to the range space t: ^{t) — )• ^(t) is also 
nilpotent, of index k — 1 . Apply the inductive hypothesis to get a string basis 
for ^(t), where the number and length of the strings is determined by t. 

B = (Pi,t(Pi],...,t^'(Pi))'^(P2,...,t^M|32]>^---'^(0i,...,t^'(Pi]> 

(In the illustration above these are the vectors of kind 1 .) 

Note that taking the final nonzero vector in each of these strings gives a basis 
C = (t^i (Pi ),..., t"^' (Pi)) for the intersection ^(t) n^(t). This is because a 
member of ^{t) maps to zero if and only if it is a linear combination of those 
basis vectors that map to zero. (The illustration shows these as 1 's in squares.) 

Now extend C to a basis for all of ^{t). 

C = C^{l^,...,lv) 

(In the illustration the ^'s are the vectors of kind 2 and so the set C is the 
set of all vectors in a square.) While which vectors t we choose isn't uniquely 
determined by t, what is uniquely determined is the number of them: it is the 
dimension of ^[t) minus the dimension of ^(t) n c/K(t). 

Finally, B C is a basis for M[t)+ ^[X] because any sum of something in the 
range space with something in the null space can be represented using elements 
of B for the range space part and elements of C for the part from the null space. 
Note that 

dim (^(t) + ^(t)) = dim(^(t)) + dim(^(t)) - dim(^(t) n ^(t)) 
= rank(t) + nullity(t) - i 
= dim(V) - i 
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and so we can extend B C to a basis for all of V by the addition of i more vectors, 
provided they are not linearly dependent on what we have already. Recall that 
each of pi , . . . , pi, is in ^(t) , and extend B C with vectors vi , . . . , Vi such that 
t(vi ) = (3i , . . . , t(vi) — Pi. (In the illustration these are the 3's.) The check 
that this extension preserves linear independence is Exercise 31. QED 



2.16 Corollary Every nilpotent matrix is similar to a matrix that is all zeros except 
for blocks of subdiagonal ones. That is, every nilpotent map is represented with 
respect to some basis by such a matrix. 



This form is unique in the sense that if a nilpotent matrix is similar to two 
such matrices then those two simply have their blocks ordered differently. Thus 
this is a canonical form for the similarity classes of nilpotent matrices provided 
that we order the blocks, say, from longest to shortest. 

2.17 Example The matrix 



M = 



has an index of nilpotency of two, as this calculation shows. 

power -p MP ^(MP) 



1 M : 




The calculation also describes how a map m represented by M must act on any 
string basis. With one map application the null space has dimension one and so 
one vector of the basis maps to zero. On a second application, the null space has 
dimension two and so the other basis vector maps to zero. Thus, the action of 
the linear transformation is pi i-> (32 f-^- and the canonical form of the matrix 
is this. 

(;:) 

We can exhibit such a m-string basis and the change of basis matrices 
witnessing the matrix similarity. For the basis, take M to represent m with 
respect to the standard bases. (We could take M to be a representative with 
respect to some other basis but the standard basis is convenient.) Pick a 
Pi G o/K(m) and also pick a pi so that m(pi ) — P2. 



Pi = 
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Recall the similarity diagram. 



^2 

^vrrt £2 



id 



^wrt B 



M 



■^2 

^vrrt £2 



id 



-4- 

^ ^wH B 



The canonical form equals Repg ^(m) = PMP ^ , where 

'1 l\ 



= R-ePB,£2(id) = 



,0 1 



P = (P 



1 1-1 




and the verification of the matrix calculation is routine. 






2.18 Example The matrix 



/ 











0\ 


1 














-1 


1 


1 


-1 


1 





1 











V 1 





-1 


1 


-V 



is nilpotent. 

power p 



NT 



._/r(NT) 



/ 







0\ 



I 

II 1-1 1 

10 

V 1 -1 1 -ly 

/O 0\ 



1 
1 

Vo oy 

-zero matrix- 



( \ 


u — V 
u 

V V y 



u,v e C} 



/0\ 

z 
u 

Vvy 



y,z,u,v e C} 



C5 



That table shows that any string basis must satisfy: the null space after one 
map application has dimension two so two basis vectors map directly to zero, 
the null space after the second application has dimension four so two additional 
basis vectors map to zero by the second iteration, and the null space after three 
applications is of dimension five so the final basis vector maps to zero in three 
hops. 

Pi ^ P2 ^ |33 ^ 

P4 H^- Ps H^- 
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To produce such a basis, first picl^ two independent vectors from yy{n] 



(33 = 



/0\ 


1 

1 



(35 = 



/0\ 




1 

VV 



then add $2,$4& J/{vJ-) such that n(|32) = |33 and ti(|34) = 



(32 = 



and finish by adding pi £ ^{w 



1 




31 — fPS 



(34 = 



/0\ 
1 

1 



such that n{|3i ) = Pa- 

n\ 



1 



voy 



Exercises 

/ 2.19 What is the index of nilpotency of the left-shift operator, here acting on the 
space of triples of reals? 

(x,-y,z) 1-^ (0,x,y] 

/ 2.20 For each string basis state the index of nilpotency and give the dimension of 
the range space and null space of each iteration of the nilpotent map. 

(a) P, P2 ^ 

|33 ^ |34 

(b) pi 132 ^ 133 ^ 

P4 1-^ 

Ps ^ 
Pe ^ 

(c) Pi P2 ^ Pa ^ 

Also give the canonical form of the matrix. 
2.21 Decide which of these matrices are nilpotent. 



(e) 



-2 




(b) 






45 


-22 


-19 


33 


-16 


-14 


69 


-34 


-29 



(c) 





2 






1 




s 


2 


i) 







-i) 




2 






2 
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/ 2.22 Find the canonical form of this 



matrix. 



1 1\ 

1 1 1 





0/ 
















/ 2.23 Consider the matrix from Example 2.18. 

(a) Use the action of the map on the string basis to give the canonical form. 

(b) Find the change of basis matrices that bring the matrix to canonical form. 

(c) Use the answer in the prior item to check the answer in the first item. 
/ 2.24 Bach of these matrices is nilpotent. 



2.25 Describe the effect of left or right multiplication by a matrix that is in the 
canonical form for nilpotent matrices. 

2.26 Is nilpotence invariant under similarity? That is, must a matrix similar to a 
nilpotent matrix also be nilpotent? If so, with the same index? 

/ 2.27 Show that the only eigenvalue of a nilpotent matrix is zero. 

2.28 Is there a nilpotent transformation of index three on a two-dimensional space? 

2.29 In the proof of Theorem 2.15, why isn't the proof's base case that the index of 
nilpotency is zero? 

/ 2.30 Let t: V be a linear transformation and suppose v e V is such that 
t'' (v) = but t^- ' (v] / 0. Consider the t-string (v, t (v) , . . . , t"- \v)). 

(a) Prove that t is a transformation on the span of the set of vectors in the string, 
that is, prove that t restricted to the span has a range that is a subset of the 
span. We say that the span is a t-invariant subspace. 

(b) Prove that the restriction is nilpotent. 

(c) Prove that the t-string is linearly independent and so is a basis for its span. 

(d) Represent the restriction map with respect to the t-string basis. 

2.31 Finish the proof of Theorem 2.15. 

2.32 Show that the terms 'nilpotent transformation' and 'nilpotent matrix', as 
given in Definition 2.7, fit with each other: a map is nilpotent if and only if it is 
represented by a nilpotent matrix. (Is it that a transformation is nilpotent if an 
only if there is a basis such that the map's representation with respect to that basis 
is a nilpotent matrix, or that any representation is a nilpotent matrix?) 

2.33 Let T be nilpotent of index four. How big can the range space of J-^ be? 

2.34 Recall that similar matrices have the same eigenvalues. Show that the converse 
does not hold. 

2.35 Lemma 2. 1 shows that any for any linear transformation t : V ^> V the restriction 
t: ^oo(t) ^coW is one-to-one. Show that it is also onto, so it is an automorphism. 
Must it be the identity map? 

2.36 Prove that a nilpotent matrix is similar to one that is all zeros except for blocks 
of super-diagonal ones. 

/ 2.37 Prove that if a transformation has the same range space as null space, then 
the dimension of its domain is even. 




Put each in canonical form. 
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2.38 Prove that if two nilpotent matrices commute then their product and sum are 
also nilpotent. 

2.39 Consider the transformation of M^xn given by ts(T) = ST — TS where S is an 
n X n matrix. Prove that if S is nilpotent then so is ts . 

2.40 Show that if N is nilpotent then I — N is invertible. Is that 'only if also? 
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IV Jordan Form 

This section uses material from three optional subsections: Direct Sum, 
Determinants Exist, and Laplace's Expansion Formula. 

The chapter on Unear maps shows that every h: V ^ W can be represented 
by a partial identity matrix with respect to some bases B c V and D c W 
that is, that the partial identity form is a canonical form for matrix equivalence. 
This chapter considers the special case that the map is a linear transformation 
t : V V. The general result still applies so we can get a partial identity with 
respect to B, D, but with the codomain equal to the domain we naturally ask 
what is possible when the two bases are also equal so that we have RepB,B (t) — 
we will find a canonical form for matrix similarity. 

We began this chapter by noting that while a partial identity matrix is the 

canonical form for the B, D case, in the B, B case there are some matrix similarity 
classes without one. We therefore extended the forms of interest to the natural 
generalization, diagonal matrices, and showed that the map or matrix can be 
diagonalized if its eigenvalues are distinct. But we also gave an example of a 

matrix that cannot be diagonalized (because it is nilpotent), and thus diagonal 
form won't do as the canonical form for all matrices. 

The prior section developed that example. We showed that a linear map is 
nilpotent if and only if there is a basis on which it acts via disjoint strings. That 
gave us a canonical form that applied to nilpotent matrices. 

This section wraps up the chapter by showing that the two cases we've 

studied are exhaustive in that for any linear transformation there is a basis such 
that the matrix representation Repg B(t) is the sum of a diagonal matrix and a 
nilpotent matrix. This is Jordan canonical form. 



IV.l Polynomials of Maps and Matrices 

Recall that the set of square matrices Mnxn is a vector space under entry-by- 
entry addition and scalar multiplication, and that this space has dimension n^. 
Thus, for any nxn matrix T the + 1-member set {I, T, T^, . . . , T"^ } is linearly 
dependent and so there are scalars co, . . . , c-^i, not all zero, such that 

c^2T"^' H + C1T + C0I 

is the zero matrix. That is, every transformation has a kind of generalized 
nilpotency: the powers of a square matrix cannot climb forever without a 
"repeat." 

1.1 Example Rotation of plane vectors n/6 radians counterclockwise is represented 
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with respect to the standard basis by 



( 



VS/l -1/2 
1/2 V3/2 



and verifying that OT'* + OT^ + IT^ - 2T - 1 1 equals the zero matrix is easy. 

1.2 Definition Let t be a linear transformation of a vector space V and let T 
be a square matrix. Where f(x) — CnX^ + • • • + Cix + Cq is a polynomial, f(t) 
is the transformation c^t^ + ■ ■ • + Cjt + Co(id) on V and f(T) is the matrix 

CnT^ + --- + ClT + CoI. 

The polynomial of the matrix represents the polynomial of the map: if T = 
Repg B(t) then f(T) = Rep^ B(f(t)). This is because T' = Repg g(t'), and 
cT = RepB B(ct], and Ti + T2 = RepB,B(ti + ti]. 

1.3 Remark Most authors write the matrix polynomial slightly differently than the 
map polynomial. For instance, if f (x) = x — 3 then most authors explicitly write 
the identity matrix f (T) = T — 31 but don't write the identity map f (t) = t — 3. 
We shall follow this convention. 

Consider again Example 1.1. The space M2x2 has dimension four, so we 
know that for any 2x2 matrix there is a fourth degree polynomial f such that 
f (T) equals the zero matrix. But for that T we exhibited a polynomial of degree 
less than four that gave the zero matrix. So for any particular map or matrix, 
degree will suffice but there may be a polynomial of smaller degree that 



1.4 Definition The minimal polynomial m(x) of a transformation t or a square 
matrix T is the polynomial of least degree and with leading coefficient one such 
that m(t) is the zero map or m(T] is the zero matrix. 

The fact that leading coefficient must be one keeps a minimal polynomial from 
being the zero polynomial. That is, a minimal polynomial must have degree at 
least one. Thus, the zero matrix has minimal polynomial p(x] — x while the 
identity matrix has minimal polynomial •p(x) = x — 1 . 

1.5 Lemma Any transformation or square matrix has a unique minimal polyno- 
mial. 

Proof We first show existence. The earlier observation that degree sufiices 
shows that there is at least one nonzero polynomial p(x) = c^x^ + • • • + Co that 
takes the map or matrix to zero (p is not the zero polynomial because the earlier 
observation includes that at least one of the coefficients is nonzero) . Prom among 
all nonzero polynomials taking the map or matrix to zero, there must be at least 
one of minimal degree. Divide this p by Cic to get a leading one. Thus for any 
map or matrix a minimal polynomial exists. 



works. 
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We now show uniqueness. Suppose that Ta{x) and fh.(x) both take the map 
or matrix to zero, are both of minimal degree and are thus of equal degree, and 
both have a leading one. In their difference d(x) — m(x) — fh.(x) the leading 
terms cancel. So d is of smaller degree than m and th. If d were to have a 
leading coefficient that is nonzero then we could divide by it to get a polynomial 
that takes the map or matrix to zero and has leading coefficient one. This 
would contradict the choice of m and fa as of minimal degree. Thus the leading 
coefficient of d is zero, so m(x) — fh,(x) is the zero polynomial, and so the two 
are equal. QED 

1.6 Example We can see that m(x) = x^ — 2x — 1 is minimal for the matrix of 
Example 1.1 by computing the powers of T up to the power = 4. 




1/2 -V3/2\ js^fo -A T^^(-'^/^ -V3/2\ 



Put 04^* + CsT^ + CiT^ + CiT + CqI equal to the zero matrix 

-(1/2)C4 + (1/2)c2 + (\/3/2)ci +co = 
-(V3/2)c4-C3-(^/2)c2- (l/2)ci =0 
(V3/2)C4 + C3 + (73/2)02+ (l/2)ci =0 
-(1/2)C4 + (1/2)c2 + (V3/2)ci +co = 

and use Gauss' Method. 

C4 — C2 — -v/Sci — 2co = 
C3 + \/3c2 + 2ci + \/3co = 

Setting C4| C3, and C2 to zero forces Ci and Co to also come out as zero. To get 
a leading one, the most we can do is to set C4 and C3 to zero. Thus the minimal 
polynomial is quadratic. 

Using the method of that example to find the minimal polynomial of a 3 x 3 
matrix would be tedious because it would mean doing Gaussian reduction on a 
system with nine equations in ten unknowns. We shall develop an alternative. 

1.7 Lemma Suppose that the polynomial f(x) = CnX^ + • • • + Cix + Co factors 
as k(x — Ai J'" • • • (x — A^)''^ . If t is a linear transformation then these two are 
equal maps. 

Cnt"^ + •• • + Cit + Co = k- [t-Ai]'" o • •• o (t- A^)"!- 

Consequently, if T is a square matrix then f(T) and k- (T — Ai • • • (T — A^I]"^* 
are equal matrices. 

Proof We use induction on the degree of the polynomial. The cases where 
the polynomial is of degree zero and degree one are clear. The full induction 
argument is Exercise 1.7 but we will give its sense with the degree two case. 
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A quadratic polynomial factors into two linear terms f (x) — k(x — Ai ) • (x — 
A2) = k(x^ + (Ai + A2)x + A] A2) (the roots Ai and A2 could be equal). We can 
check that substituting t for x in the factored and unfactored versions gives the 
same map. 

(k • (t - Ai ) o (t - A2)) (v) = (k • (t - Ai )) (t(v) - A2V) 

= k • (t(t(v] ) - t(A2v) - Ai t(v) - Ai A2v) 
= k - (tot(v] - (Ai +A2)t(A7) +A1A2V) 
= k. (t2-(Ai +A2)t + A,A2) (v) 

The third equality holds because the scalar A2 comes out of the second term, 
since t is linear. QED 

In particular, if a minimal polynomial Ta(x) for a transformation t factors 
as ra(x) = (x - Ai j-^i • • • (x- A^)i- then m(t) = (t- Ai)ii o • • • o (t- A^Ji- is 
the zero map. Since ra(t) sends every vector to zero, at least one of the maps 
t — At sends some nonzero vectors to zero. Exactly the same holds in the matrix 
case — if m is minimal for T then m(T) = (T — Ai 1)1' • • • (T — A^I)''^ is the zero 
matrix and at least one of the matrices T — Atl sends some nonzero vectors to 
zero. That is, in both cases at least some of the A^ are eigenvalues. (Exercise 29 
expands on this.) 

The next result is that every root of the minimal polynomial is an eigenvalue, 
and further that every eigenvalue is a root of the minimal polynomial (i.e, below 
it says '1 ^ qi' and not just '0 ^ qi'). For that result, recall that to find 
eigenvalues we solve |T — xl| = and this determinant gives a polynomial in x, 
called the characteristic polynomial, whose roots are the eigenvalues. 

1.8 Theorem (Cayley-Hamilton) If the characteristic polynomial of a transforma- 
tion or square matrix factors into 

then its minimal polynomial factors into 

(x-A,)q'(^-A2)^^ •••(x-AJ''- 
where 1 ^ ^ pi for each 1 between 1 and z. 

The proof takes up the next three lemmas. We will state them in matrix terms 
but they apply equally well to maps. (The matrix version is convenient for the 
first proof.) 

The first result is the key. For the proof, observe that we can view a matrix 
of polynomials as a polynomial with matrix coefficients. 

2x2+3x-1 \ f2 ]\ 2 0\ (-'^ A 

3x2+4x+1 4x^+x+])~[3 V''+l^4 1/ 
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1.9 Lemma If T is a square matrix with characteristic polynomial c{x) then c(T) 
is the zero matrix. 

Proof Let C be T — xl, the matrix whose determinant is the characteristic 
polynomial c(x) — CnX^ + • • • + Cix + Cq. 



/ti J -X ti,2 
t2,l t2,2-X 



v 



trL,TL 



Recall Theorem Four. III. 1.9, that the product of a matrix with its adjoint equals 
the determinant of the matrix times the identity. 



c[x) -I^adjlOC^adj^O^T-xI) =adj(C)T-adj(C) -x 



(*) 



The left side of (*) is Cnlx^ + Cn-i Ix^^' +• • ■ + c^ Ix+CqI. For the right side, the 
entries of adj(C) are polynomials, each of degree at most n — 1 since the minors 
of a matrix drop a row and column. As suggested before the proof, rewrite it 



as a polynomial with matrix coefficients: adj(C) — Cn-ix 



n-l 



+ Cix + Co 



where each is a matrix of scalars. Now this is the right side of (*). 

[(Cn-iT)x^-i + • • • + (CiT)x + CoT] - [Cn-ix^ - C^-zx""-^ Cox] 

Equate the left and right side of (*)'s coefficients of x"^, of x^^^ , etc. 

CrtI = -Cn-l 
Cn-1 I = — Cti-2 + Cn-l T 



C] I = -Co + CiT 
Col = CoT 

Multiply, from the right, both sides of the first equation by T"^, both sides of 
the second equation by T'^^^ , etc. 



CnT^ = -C 



n-l 



T^ 



ciT = -CoT+CiT^ 
Col = CoT 

Add. The left is CnT'^ + Cn— 1 T^ ^ + • • ■ + Col. The right telescopesj for instance 
— Cn-iT"- from the first line combines with the +Cn-iT^ half of the second 
line, and the total on the right is the zero matrix. QED 
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We refer to that result by saying that a matrix or map satisfies its charac- 
teristic polynomial. 

1.10 Lemma Where f(x) is a polynomial, if f (T) is the zero matrix then f (x) 
is divisible by the minimal polynomial of T. That is, any polynomial that is 
satisfied by T is divisible by T's minimal polynomial. 

Proof Let m[x] be minimal for T. The Division Theorem for Polynomials gives 
f (x) = q(x)Ta(x) + r(x) where the degree of r is strictly less than the degree of 
m. Plugging in T shows that r(T) is the zero matrix, because T satisfies both f 
and m. That contradicts the minimality of m unless r is the zero polynomial. 
QED 

Combining the prior two lemmas gives that the minimal polynomial divides 
the characteristic polynomial. Thus, any root of the minimal polynomial is 
also a root of the characteristic polynomial. That is, so far we have that if 
m(x) = (x-Ai)l' • • • (x-Ai)"^^ thenc(x) has the form (x-Ai)Pi • • • (x-Ai)P'(x- 
■^i+i)^*^^' • • • (x — Az)T'=' where each qj is less than or equal to pj. We finish 
the proof of the Cayley-Hamilton Theorem by showing that the characteristic 
polynomial has no additional roots, that is, there are no A^+j , Ai+a. etc. 

1.11 Lemma Each linear factor of the characteristic polynomial of a square matrix 
is also a linear factor of the minimal polynomial. 

Proof Let T be a square matrix with minimal polynomial m(x) and assume 
that X — A is a factor of the characteristic polynomial of T, that A is an eigenvalue 
of T. We must show that x — A is a factor of m, i.e., that m(A) = 0. 

Suppose that A is an eigenvalue of T with associated eigenvector v. Then 
has the eigenvalue A^ associated with v because T • Tv = T • Av = ATv — A^v. 
Similarly T'^ has the eigenvalue A"^ associated with v. 

With that we have that for any polynomial function p(x), application of the 
matrix p(T) to v equals the result of multiplying v by the scalar p(A). 

p(T) • V = (ckT'^ H +CiT + Col) • V = CkT'^vH hCiTv + Cqv 

= CicA'^v H h Ci Av + cov = p(A) • v 

Since Ta(T) is the zero matrix, — m(T) (v) = m(A) • v for all v, and hence 
m(A) = 0. QED 

That concludes the proof of the Cayley-Hamilton Theorem. 

1.12 Example We can use the Cayley-Hamilton Theorem to find the minimal 
polynomial of this matrix. 



T = 
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First we find its characteristic polynomial c(x) = (x — 1)(x — 2)^ with the 
usual determinant. Now the Cayley-Hamilton Theorem says that T's minimal 
polynomial is either (x — 1 ) (x — 2) or (x — 1 ) (x — 2)^ or (x — 1 ) (x — 2)^ . We can 
decide among the choices just by computing 



(T-1I)(T-2I) = 



and 



(T-1I)(T-2I)2 = 
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and so m(x) 
Exercises 



(x-1)(x-2)2. 



/ 1.13 What are the possible minimal polynomials if a matrix has the given character- 
istic polynomial? 

(a)8-(x-3)'' (b) (1/3)- (x+1)3(x-4) (c) -1 • (x - 2)2{x - 5)^ 
(d) 5- (x + 3)2{x-1)(x-2)2 
What is the degree of each possibility? 
/ 1.14 Find the minimal polynomial of each matrix. 



(a) 





(e) 



1.15 Find the minimal polynomial of this matrix. 

pi?) 

\1 0/ 

/ 1.16 What is the minimal polynomial of the differentiation operator d/dx on 9^? 
/ 1.17 Find the minimal polynomial of matrices of this form 

/A ... 0\ 
1 A 
1 A 

A 

\0 ... 1 A/ 

where the scalar A is fixed (i.e., is not a variable). 
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1.18 What is the minimal polynomial of the transformation of y,^ that sends p(x) 
top{x + 1)? 

1.19 What is the minimal polynomial of the map n: — > projecting onto the 
first two coordinates? 

1.20 Find a 3x3 matrix whose minimal polynomial is x^. 

1.21 What is wrong with this claimed proof of Lemma 1.9: "if c(x) — |T — xl| then 
c(T) = |T-TI| = 0"? [Cullen] 

1.22 Verify Lemma 1.9 for 2x2 matrices by direct calculation. 

/ 1.23 Prove that the minimal polynomial of an nxn matrix has degree at most n 

(not as a person might guess from this subsection's opening). Verify that this 

maximum, n, can happen. 
/ 1.24 Show that, on a nontrivial vector space, a linear transformation is nilpotent if 

and only if its only eigenvalue is zero. 
1.25 What is the minimal polynomial of a zero map or matrix? Of an identity map 

or matrix? 

/ 1.26 Interpret the minimal polynomial of Example 1.1 geometrically. 

1.27 What is the minimal polynomial of a diagonal matrix? 
/ 1.28 A projection is any transformation t such that — t. (For instance, consider 

the transformation of the plane projecting each vector onto its first coordinate. 

If we project twice then we get the same result as if we project just once.) What is 

the minimal polynomial of a projection? 

1.29 The first two items of this question are review. 

(a) Prove that the composition of one-to-one maps is one-to-one. 

(b) Prove that if a linear map is not one-to-one then at least one nonzero vector 
from the domain maps to the zero vector in the codomain. 

(c) Verify the statement, excerpted here, that precedes Theorem 1.8. 

... if a minimal polynomial ra(x) for a transformation t factors as 
m(x) = (x-Ai)'" ■••(x-A;,)''^' then m(t) = (t-A,)ii o • • • o (t- A;,)i=' 
is the zero map. Since Ta(t) sends every vector to zero, at least one of 
the maps t — Ai sends some nonzero vectors to zero. . . . That is, ... 
at least some of the Ai are eigenvalues. 

1.30 True or false: for a transformation on an n dimensional space, if the minimal 
polynomial has degree n then the map is diagonalizable. 

1.31 Let f{x) be a polynomial. Prove that if A and B are similar matrices then f{A) 
is similar to f(B). 

(a) Now show that similar matrices have the same characteristic polynomial. 

(b) Show that similar matrices have the same minimal polynomial. 

(c) Decide if these are similar. 

'^ 3\ f4 -r 



1.32 (a) Show that a matrix is invertible if and only if the constant term in its 
minimal polynomial is not 0. 

(b) Show that if a square matrix T is not invertible then there is a nonzero matrix 
S such that ST and TS both equal the zero matrix. 
/ 1.33 (a) Finish the proof of Lemma 1.7. 

(b) Give an example to show that the result does not hold if t is not linear. 
1.34 Any transformation or square matrix has a minimal polynomial. Does the 
converse hold? 
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IV.2 Jordan Canonical Form 

We are looking for a canonical form for matrix similarity. This subsection 
completes this program by moving from the canonical form for the classes of 
nilpotent matrices to the canonical form for all classes. 

2.1 Lemma A linear transformation on a nontrivial vector space is nilpotent if 
and only if its only eigenvalue is zero. 

Proof Let the linear transformation be t: V — > V. If t is nilpotent then there 
is an n such that t"^ is the zero map, so t satisfies the polynomial p(x) — 
(x — 0]^. By Lemma 1.10 the minimal polynomial of t divides p, so the minimal 
polynomial has only zero for a root. By Cayley-Hamilton, Theorem 1.8, the 
characteristic polynomial has only zero for a root. Thus the only eigenvalue of t 
is zero. 

Conversely, if a transformation t on an n-dimensional space has only the 
single eigenvalue of zero then its characteristic polynomial is x'^. Lemma 1.9 
says that a map satisfies its characteristic polynomial so is the zero map. 
Thus t is nilpotent. QED 

The phrase "nontrivial vector space" is there because on a trivial space {0} the 
only transformation is the zero map, which has no eigenvalues because there are 
no associated nonzero eigenvectors. 

2.2 Corollary The transformation t— A is nilpotent if and only if t's only eigenvalue 
is A. 

Proof The transformation t — A is nilpotent if and only if t — A's only eigenvalue 
is 0. That holds if and only if t's only eigenvalue is A, because t(v) — Av if and 
only if (t - A) (v) = • v. QED 

We already have the canonical form that we want for the case of nilpotent 
matrices, that is, for each matrix whose only eigenvalue is zero. Corollary III.2.16 
says that each such matrix is similar to one that is all zeroes except for blocks 
of subdiagonal ones. 

2.3 Lemma If the matrices T — AI and N are similar then T and N + AI are also 
similar, via the same change of basis matrices. 

Proof With N = P(T - AI)p-i = PTP-i - P(AI)p-i we have N = PTP-^ - 
PP^^ (AI) since the diagonal matrix AI commutes with anything, and so N = 
PTP-i - AI. Therefore N + AI = PTP^^ . QED 

2.4 Example The characteristic polynomial of 
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is (x — 3)^ and so T has only the single eigenvalue 3. Thus for 

T-3I 



-1 -1 



the only eigenvalue is and T — 31 is nilpotent. Finding the null spaces is routine; 
to ease this computation we take T to represent a transformation t: — > C'^ 
with respect to the standard basis (we shall do this for the rest of the chapter). 

^[t - 3] = { 1^^^^ I y e C} ^((t - 3]2) = C2 

The dimensions of these null spaces show that the action of the map t — 3 on a 
string basis is pi f-s- Pi 0- Thus, here is the canonical form for t — 3 with one 
choice for a string basis. 



RepB,B(t-3] = N = 
By Lemma 2.3, T is similar to this matrix. 

Repe B(t) = N + 31 



'3 0^ 
1 3, 



We can produce the similarity computation. Recall how to find the change of 
basis matrices P and P^^ to express N as P(T — 3I)P^^. The similarity diagram 



T-3I 



wrt £ 2 



id 



id 



^2 ^ ^ . (p2 

'wrt B ^ ^wrt B 



describes that to move from the lower left to the upper left we multiply by 



-1 



(Rep£2^B(id)) ^ = RepB,£ Jid) = ^ 



1 -2' 
1 2, 



and to move from the upper right to the lower right we multiply by this matrix. 

(^ -iV'^f 1/2 i/2\ 

[] 2) \~^/A ]/4j 

So this equation expresses the similarity. 

(3 0\( 1/2 1/2\ (2 -l\ /l -2\ 
h 3 1-1/4 1/4] ll 4] ll 2 
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2.5 Example This matrix has characteristic polynomial (x — 4)^ 
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4/ 



T : 



and so has the single eigenvalue 4. The nullities are: the null space of t — 4 has 
dimension two, the null space of (t — 4)^ has dimension three, and the null space 
of (t — 4)^ has dimension four. Thus, t — 4 has the action on a string basis of 
(3i !—> pi i-^ and (34 i-> 0. This gives the canonical form N for t — 4, 

which in turn gives the form for t. 



N +41 = 



An array that is all zeroes, except for some number A down the diagonal and 
blocks of subdiagonal ones, is a Jordan block. We have shown that Jordan block 
matrices are canonical representatives of the similarity classes of single-eigenvalue 
matrices. 

2.6 Example The 3x3 matrices whose only eigenvalue is 1/2 separate into three 
similarity classes. The three classes have these canonical representatives. 

'l/2 0^ 
1/2 
1/2y 

In particular, this matrix 



belongs to the similarity class represented by the middle one, because we have 
adopted the convention of ordering the blocks of subdiagonal ones from the 
longest block to the shortest. 

We will now finish the program of this chapter by extending this work to 
cover maps and matrices with multiple eigenvalues. The best possibility for 
general maps and matrices would be if we could break them into a part involving 
their first eigenvalue Ai (which we represent using its Jordan block), a part with 
A2, etc. 

This best possibility is what happens. For any transformation t: V ^ V, we 
shall break the space V into the direct sum of a part on which t — Ai is nilpotent, 
a part on which t — A2 is nilpotent, etc. 

Suppose that t: V V is a linear transformation. The restriction* of t to a 
subspace M need not be a linear transformation on M because there may be an 

* More information on restrictions of functions is in the appendix. 
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Tu G M with t{Tn.) ^ M (for instance, the transformation that rotates the plane 
by a quarter turn does not map most members of the x — y line subspace back 
within that subspace). To ensure that the restriction of a transformation to a 
part of a space is a transformation on the part we need the next condition. 

2.7 Definition Let t: V V be a transformation. A subspace M is t invariant 
if whenever tu e M then t(m) e M (shorter: t(M) C M). 

Recall that Lemma III. 1.4 shows that for any transformation t on an n di- 
mensional space the range spaces of iterates are stable 

as are the null spaces. 

Thus, the generalized null space ^ooW and the generalized range space ^oo(t) 
are t invariant. In particular, jVao[t — A|) and ^oo(t — Ai.) are t — Ai invariant. 

The action of the transformation t — A| on ,A'oo['i — A|) is especially easy to 
understand. Observe that any transformation t is nilpotent on .yVooiX), because if 

V S ^ooW then by definition t^(v) — 0. Thus t — At is nilpotent on .y^(t — Ai). 

We shall take three steps to prove this section's major result. The next result 
is the first. 

2.8 Lemma A subspace is t invariant if and only if it is t — A invariant for all 
scalars A. In particular, if Ai is an eigenvalue of a linear transformation t then 
for any other eigenvalue Aj the spaces ,y(^(t — Ai) and ^oo(t — Ai) are t — Aj 
invariant. 

Proof For the first sentence we check the two implications separately. The 
'if half is easy: if the subspace is t — A invariant for all scalars A then using 
A = shows that it is t invariant. For 'only if suppose that the subspace is t 
invariant, so that if rn. G M then t(Tn.) e M, and let A be a scalar. The subspace 
M is closed under linear combinations and so if t(Tn,) e M then t(Tfi) — Am G M. 
Thus if Tfi e M then (t - A) (rii) e M. 

The second sentence follows from the first. The two spaces are t— Ai invariant 
so they are t invariant. Apply the first sentence again to conclude that they are 
also t — Aj invariant. QBD 

The second step of the three that we will take to prove this section's major 
result makes use of an additional property of c/f^(t — Ai) and .^oo(t — Ai), that 
they are complementary. Recall that if a space is the direct sum of two others 

V — jy © 31 then any vector v in the space breaks into two parts v = ft + f 
where ft e and f € and recall also that if B,^/ and B.^^ are bases for JV 
and S/i then the concatenation B^^ B^ is linearly independent (and so the two 
parts of V do not "overlap") . The next result says that for any subspaces J/ and 
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that are complementary as well as t invariant, the action of t on v breaks into 
the "non-overlapping" actions of t on tt and on r. 

2.9 Lemma Let t: V ^ V be a transformation and let jV and ^ be t invariant 
complementary subspaces of V. Then we can represent t by a matrix with blocks 
of square submatrices Ti and T2 

/ Ti Z2 \ } dim{^)-many rows 



where Zi and Z2 are blocks of zeroes. 

Proof Since the two subspaces are complementary, the concatenation of a basis 
for JV and a basis for Si makes a basis B = (vi , . . . , Vp , p^i , . . . , flq ) for V. We 
shall show that the matrix 



RepB,B(t) = 



RepB(t(vi)) ••• RepB(t(i3q)) 



has the desired form. 

Any vector v G V is a member of jY if and only if when it is represented 
with respect to B its final q coefficients are zero. As ,jV is t invariant, each of 
the vectors RepB(t('Vi )), . . . , RepBltfvp)) has this form. Hence the lower left 
of RepB_B(t) is all zeroes. The argument for the upper right is similar. QED 

To see that we have decomposed t into its action on the parts, let Byv = 
(V] , . . . , Vp) and B^ — , . . . , flq). Observe that the restrictions of t to the 
subspaces jV and ^ are represented with respect to the bases B^,B^ and 
^Sii^Sf, by the matrices Ti and T2. So with subspaces that are invariant and 
complementary we can split the problem of examining a linear transforma- 
tion into two lower-dimensional subproblems. The next result illustrates this 
decomposition into blocks. 

2.10 Lemma If T is a matrix with square submatrices Ti and T2 



T2/ 

where the Z's are blocks of zeroes, then |T| = |Ti | • IT2I. 

Proof Suppose that T is nxn, that Ti is pxp, and that T2 is q x q. In the 
permutation formula for the determinant 



|T1 = ^ tl,ct)(l)t2,(l3(2) ■ ■ -tn,*!!!) sgn(4)) 

permutations 4) 
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each term comes from a rearrangement of the column numbers 1 , . . . , n into a 
new order 4^(1 )> • • • > •t'ln-)- The upper right block Z2 is all zeroes, so if a (() has 
at least one of p + 1 , . . . ,n among its first p column numbers 4)(1 ),..., 4)(p) 
then the term arising from 4) does not contribute to the sum because it is zero, 

e.g., if 4)(1) = n then t, ^((.(i )t2,4,(2) • ■■tn^^^n) = • t2,^{2) • ■ ■ tn,ct,(n) = 0. 

So the above formula reduces to a sum over all permutations with two 
halves: any contributing c}) is the composition of a (jji that rearranges only 
1 , . . . ,p and a (p2 that rearranges only p + 1,...,p + q. Now, the distributive 
law and the fact that the signum of a composition is the product of the signums 
gives that this 

perms i 
of l,...,p 

•( Y. tp + i,4,2(p + i) • • -tp + q.tl^alp + q) Sgn(4)2] j 

^ perms 
of p + 1 ,...,p + q 

equals iT| = ^,„„t,ibutingct,ti,4>(i)t2,*(2)---tn,4.(n)Sgn(4)). QED 
2.11 Example 
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= 36 



Prom Lemma 2.10 we conclude that if two subspaces are complementary and 
t invariant then t is one-to-one if and only if its restriction to each subspace is 
nonsingular. 

Now for the promised third, and final, step to the main result. 

2.12 Lemma If a linear transformation t : V V has the characteristic polynomial 
(x-A,)P' ...(x-AkjPx then (1) V = ^(t- Ai) ® ••• ©^(t- A^) and 
(2) dim(^(t-Ai)) =pi. 

Proof This argument consists of proving two preliminary claims, followed by 
proofs of clauses (1) and (2). 

The first claim is that c/f^(t — A|) n ,yKx>['t — ^]) — {0} when i ^ j. By 
Lemma 2.8 both c/(^(t — A|) and ,A'oo[\ — Aj) are t invariant. The intersection 
of t invariant subspaces is t invariant and so the restriction of t to ./^(t — A^) n 
.y)^(t — Aj) is a linear transformation. Now, t — A,^ is nilpotent on .y^(t — A,^) and 
t — Aj is nilpotent on (t — Aj ), so both t — At and t — Aj are nilpotent on the 
intersection. Therefore by Lemma 2.1 and the observation following it, if t has 
any eigenvalues on the intersection then the "only" eigenvalue is both Ai, and Aj . 
This cannot be, so the restriction has no eigenvalues: ^/^^(t — Ai) n ./(^(t — Aj) 
is the trivial space (Lemma 3.12 shows that the only transformation that is 
without any eigenvalues is the transformation on the trivial space). 
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The second claim is that o/)^ (t ^ -^v) ^ ^oo (t — Aj ) , where i ^ j . To verify it 
we will show that t — Aj is one-to-one on ,yVco[t — At] so that, since c/(^(t — At) 
is t — Aj invariant by Lemma 2.8, the map t — Aj is an automorphism of the 
subspace ./(^(t — At), and therefore that ./(^(t — Ai) is a subset of each ^(t — Aj ), 
^((t — Aj)^), etc. For the verification that the map is one-to-one suppose that 
V € ^/•oo(t — Ai) is in the null space of t — Aj, aiming to show that v = 0. 
Consider the map [(t — At) — (t — Aj)]"-. On the one hand, the only vector that 
(t — Ai) — (t — Aj ) = Ai — Aj maps to zero is the zero vector. On the other hand, 
as in the proof of Lemma 1.7 we can apply the binomial expansion to get this. 

(t - At)-(v) + (^^ [X-\,r-\x~ Aj )' (v) + Q (t - AO"-'(t ~ Aj )2(v) + • • • 

The first term is zero because v e o/(^(t — Ai) while the remaining terms are 
zero because v is in the null space of t — Aj . Therefore v = 0. 

With those two preliminary claims done we can prove clause (1), that the space 
is the direct sum of the generalized null spaces. By Corollary III. 2. 2 the space is 
the direct sum V — c/(^(t — Ai )0^oo(t~Ai ). By the second claim ./^^(t — Ai] C 
•^oo (t — Ai ) and so we can get a basis for M^a (t — Ai ) by starting with a basis for 
o/(^(t — A2) and adding extra basis elements taken from ^oo(t — Ai )n^oo(t^A2). 
Thus V = o/(^(t- Ai] ® ^(t- A2) ® (^oo(t- Ai) n^oo(t- A2)). Continuing 
in this way we get this. 

V = .-A^lt - Ai ) ® • • • ® ^oo(t - Ak) © (^oo(t - Ai ) n • • • n ^oo(t - Ak)) 

The first claim above shows that the final space is trivial. 

We finish by verifying clause (2). Decompose V as o/(^(t — A^) © Mooi^ ~ At) 
and apply Lemma 2.9. 



Ti 


Z2^ 


}dim(^oo(t-Ai) 


)-many rows 




T2J 


}dim(^oo(t-A|) 


)-many rows 



Lemma 2.10 says that jT — xl| — |Ti — xl| • IT2 — xl|. By the uniqueness clause 
of the Fundamental Theorem of Algebra, Theorem 1. 1.11, the determinants of 
the blocks have the same factors as the characteristic polynomial |Ti — xl| = 
(x~Ai)qi •••(x-A^)q^ and|T2-xI| = (x-Ai ••• (x-A^)^\ where qi +ri =pi, 
. . . , c{]c+r]c — pk- We will finish by establishing that (i) c|j = for all j 7^ i, and 
(ii) qi, —pi- Together these prove clause (2) because they show that the degree 
of the polynomial |Ti — xl| is qi, and the degree of that polynomial equals the 
dimension of the generalized null space o/>^(t — At). 

For (i), because the restriction of t — A^ to .y4^(t — A^] is nilpotent on that 
space, t's only eigenvalue on that space is A^, by Lemma 2.2. So qj = for j ^ i. 

For (ii), consider the restriction of t to ^oo(t — At). By Lemma III. 2.1, the 
map t — Ai is one-to-one on (t — Ai) and so Ai is not an eigenvalue of t on 
that subspace. Therefore x — Ai is not a factor of IT2 — xl|, so Ti = 0, and so 
qi=Pi. QED 
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Recall the goal of this chapter, to give a canonical form for matrix similarity. 
That result is next. It translates the above steps into matrix terms. 

2.13 Theorem Any square matrix is similar to one in Jordan form 
/ Ja , -zeroes- ^ 

jAk-1 

\ -zeroes- Ja^/ 

where each Ja is the Jordan block associated with an eigenvalue A of the orig- 
inal matrix (that is, is all zeroes except for A's down the diagonal and some 
sub diagonal ones). 

Proof Given an nxn matrix T, consider the linear map t: C"^ — > that it 
represents with respect to the standard bases. Use the prior lemma to write 
= ^/(oo(t — A] ] • • • © ,yVoo(t — Ak) where A] , . . . , A^ are the eigenvalues of t. 
Because each ,yVoo[t — At] is t invariant. Lemma 2.9 and the prior lemma show 
that t is represented by a matrix that is all zeroes except for square blocks along 
the diagonal. To make those blocks into Jordan blocks, pick each Ba^ to be a 
string basis for the action of t — Ai on o/(^(t — Ai). QED 



2.14 Corollary Every square matrix is similar to the sum of a diagonal matrix 
and a nilpotent matrix. 

Strictly speaking, to make Jordan form a canonical form for matrix similarity 
classes it must be unique. That is, for any square matrix there needs to be one 
and only one matrix J similar to it and of the specified form. As we have stated 
it, the theorem allows us to rearrange the Jordan blocks. We could make this 
form unique, say by arranging the Jordan blocks from the least eigenvalue to 
greatest and arranging the blocks of subdiagonal ones inside each Jordan block 
from longest to shortest. 

2.15 Example This matrix has the characteristic polynomial [x — 2)-^(x — 6). 




We will handle the eigenvalues 2 and 6 separately. 

First the eigenvalue 2. Computation of the powers of T — 21, and of the null 
spaces and nullities, is routine. (Recall from Example 2.4 our convention of 
taking T to represent a transformation t: — ?• with respect to the standard 
basis.) 
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So the generalized null space (t — 2) has dimension two. We know that the 
restriction of t — 2 is nilpotent on this subspace. Prom the way that the nuUities 
grow we know that the action of t — 2 on a string basis is |3i i— ^ |32 i— ^ 0. Thus 
we can represent the restriction in the canonical form 



N2 = 




RepB,B(t-2] 



B2 = 





(other choices of basis are possible). Consequently, the action of the restriction 
of t to (t — 2) is represented by this matrix. 



J2 =N2 + 2I = RepB„Bjt) 



The second eigenvalue's computations are easier. Because the power of x — 6 
in the characteristic polynomial is one, the restriction of t — 6 to ,y>^ (t — 6) must 
be nilpotent of index one. Its action on a string basis must be P3 i-> and since 
it is the zero map, its canonical form Ng is the 1 x 1 zero matrix. Consequently, 
the canonical form Jg for the action of t on (t — 6] is the 1 x 1 matrix with the 
single entry 6. For the basis we can use any nonzero vector from the generalized 
null space. 



Bg = 



Taken together, these two give that the Jordan form of T is 

fl o\ 

RepB,B(t)= 1 2 

\0 Gj 

where B is the concatenation of B2 and Bg. 
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2.16 Example Contrast the prior example with 

^2 2 f 
6 2 

^0 2; 

which has the same characteristic polynomial (x — 2)^(x — 6). 
While the characteristic polynomial is the same, 

p (T-6IP ^((t-ep) nullity 



1 




xe C} 



1 



—same— 



V 



here the action of t — 2 is stable after only one application — the restriction of 

t — 2 to ,yVoo{^ — 2) is nilpotent of index one. The restriction of t — 2 to the 
generalized null space acts on a string basis as (3i and Pi i-> 0, and we get 
this Jordan block associated with the eigenvalue 2. 



J2 = 



So the contrast with the prior example is that while the characteristic 

polynomial tells us to look at the action of t — 2 on its generalized null space, the 
characteristic polynomial does not completely describe t — 2's action. We must 
do some computations to find that the minimal polynomial is (x — 2)(x — 6). 

For the eigenvalue 6 the arguments for the second eigenvalue of the prior 
example apply again. The restriction of t — 6 to ^/^(t — 6) is nilpotent of 
index one (it can't be of index less than one and since x — 6 is a factor of the 
characteristic polynomial with the exponent one it can't be of index more than 
one either). Thus t — 6's canonical form Ng is the 1 x 1 zero matrix, and the 
associated Jordan block Jg is the 1 x 1 matrix with entry 6. 




2 0' 
RepBRW = I 2 



B = B2 B6 = 



^0 6j 

Checking that the third vector in B is in the null space of t — 6 is routine. 
2.17 Example A bit of computing with 




/-I 


4 








0\ 





3 














-4 


-1 








3 


-9 


-4 


2 


-1 


V 1 


5 


4 


1 


4/ 
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shows that its characteristic polynomial is (x — 3)^(x + 1)^. This table 



(T-3I)T 



nullity 



/-4 


3 

V 1 

/ 16 



-16 

V 

/-64 


64 

V 







-4 

-4 -1 

4 1 





1 

V 

-16 0\ 

16 16 
32 16 

-16 -16 0/ 

64 0\ 



-64 -64 

-128 -64 

64 64 Oy 



0\ /-(u + v)/2\ 
-(u + v)/2 
(u + v)/2 
u 

V 



V 



u,veC} 2 



/-A 

-z 
z 
u 



z,u,v e C} 



— saTTie— 



—same- 



shows that the restriction of t — 3 to =/>^(t — 3] acts on a string basis via the 
two strings |3i i->^ Pi i-^ and ^3 0. 

A similar calculation for the other eigenvalue 



p (T+1I)P ^((t+ljT'] nullity 



2 





fo 


4 








o\ 




/ 


(u + 











4 



























-4 











{ 




— V 




u,veC} 2 




3 


-9 


-4 


3 


-1 






u 










5 


4 


1 


V 




I 


V 






(0 


16 








o\ 













16 






















-16 

















-same— —same 


8 


-40 


-16 


8 


-8 










V8 


24 


16 


8 


24J 











shows that the restriction of t + 1 to its generalized null space acts on a string 
basis via the two separate strings $4 and P5 i-)- 0. 
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Therefore T is similar to this Jordan form matrix. 



/- 


-1 











o\ 







-1 



















3 
















1 


3 





V 














3/ 



Exercises 

2.18 Do the check for Example 2.4. 

2.19 Each matrix is in Jordan form, 
minimal polynomial. 



State its characteristic polynomial and its 









(b) 
















G 


") 






") 






2 


















-1/2 


/3 








°\ 












^\ 




1 


3 








(f) 


1 


4 








(g) 








3 











-4 





Vo 





1 


3/ 




Vo 





1 


-4^ 












'\ 









0\ 









2 













2 















2 





(i) 





1 


2 






vo 








3/ 




Vo 





3^ 














) 


3 


i) 




1 






(e) 



(h) 



/ 2.20 Find the Jordan form from the given data. 

(a) The matrix T is 5x5 with the single eigenvalue 3. The nullities of the powers 
are: T — 31 has nullity two, (T — 31)^ has nullity three, (T — 31)'^ has nullity four, 
and (T — 31)'* has nullity five. 

(b) The matrix S is 5x5 with two eigenvalues. For the eigenvalue 2 the nullities 
are: S — 21 has nullity two, and (S — 21)^ has nullity four. For the eigenvalue —1 
the nullities are: S + 1 1 has nullity one. 

2.21 Find the change of basis matrices for each example. 

(a) Example 2.15 (b) Example 2.16 (c) Example 2.17 
/ 2.22 Find the Jordan form and a Jordan basis for each matrix. 



(e) 



10 






25 






9 


7 


3 


-9 


-7 




4 


4 





(b) 



(f) 



-4 



(c) 

2 
1 

-2 



4 









1 











i) 










(g) 










/ 2.23 Find all possible Jordem forms of a transformation with characteristic polynomial 

2.24 Find all possible Jordan forms of a transformation with characteristic polynomial 

(x-1]3{x + 2]. 

/ 2.25 Find all possible Jordan forms of a transformation with characteristic polynomial 



{x — 2)^{x + 1 ) and minimal polynomial (x — 2) 



1). 



2.26 Find all possible Jordan forms of a transformation with characteristic polynomial 
{x — 2)''{x + 1 ] and minimal polynomial (x — 2)^(x + 1 ). 
/ 2.27 Diagonalize these. 
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/ 2.28 Find the Jordan matrix representing the differentiation operator on Vi. 
/ 2.29 Decide if these two are similar. 

n -A /-I 0^ 



V4 -3J V 1 -1 
2.30 Find the Jordan form of this matrix. 

^0 -1 



Also give a Jordan basis. 
2.31 How many similarity classes are there for 3x3 matrices whose only eigenvalues 
are —3 and 4? 

/ 2.32 Prove that a matrix is diagonalizable if and only if its minimal polynomial has 
only linear factors. 

2.33 Give an example of a linear transformation on a vector space that has no 
non-trivial invariant subspaces. 

2.34 Show that a subspace is t — Ai invariant if and only if it is t — A2 invariant. 

2.35 Prove or disprove: two nxn matrices are similar if and only if they have the 
same characteristic and minimal polynomials. 

2.36 The trace of a square matrix is the sum of its diagonal entries. 

(a) Find the formula for the characteristic polynomial of a 2 x 2 matrix. 

(b) Show that trace is invariant under similarity, and so we can sensibly speak of 
the 'trace of a map'. {Hint: see the prior item.) 

(c) Is trace invariant under matrix equivalence? 

(d) Show that the trace of a map is the sum of its eigenvalues (counting multi- 
plicities). 

(e) Show that the trace of a nilpotent map is zero. Does the converse hold? 

2.37 To use Definition 2.7 to check whether a subspace is t invariant, we seemingly 
have to check all of the infinitely many vectors in a (nontrivial) subspace to see if 
they satisfy the condition. Prove that a subspace is t invariant if and only if its 
subbasis has the property that for all of its elements, t(p) is in the subspace. 

/ 2.38 Is t invariance preserved under intersection? Under union? Complementation? 
Sums of subspaces? 

2.39 Give a way to order the Jordan blocks if some of the eigenvalues are complex 
numbers. That is, suggest a reasonable ordering for the complex numbers. 

2.40 Let IPjlK) be the vector space over the reals of degree j polynomials. Show 
that if j ^ k then J'j(R) is an invariant subspace of ^ktR] under the differentiation 
operator. In J'7(R), does any of yolR), ■ • ■ , iPelR) have an invariant complement? 

2.41 In y^lK), the vector space (over the reals) of degree n polynomials, 

£ = {p(x) e VnW I p(-x) = p(x) for all x} 

and 

= {p(x) e ^nlK) I p(-x) = -p{x) for all x} 
are the even and the odd polynomials; p(x) — x^ is even while p{x) = x^ is odd. 
Show that they are subspaces. Are they complementary? Are they invariant under 
the differentiation transformation? 

2.42 Lemma 2.9 says that if M and N are invariant complements then t has a 
representation in the given block form (with respect to the same ending as starting 
basis, of course). Does the implication reverse? 
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2.43 A matrix S is the square root of another T if = T. Show that any nonsingular 
matrix has a square root. 



Method of Powers 



In applications the matrices can be quite large. Calculating eigenvalues and 
eigenvectors by finding and solving the characteristic polynomial can be too slow 
and too hard. There are techniques that avoid the characteristic polynomial. 
Here we shall see such a method that is suitable for large matrices that are 
sparse, meaning that the great majority of the entries are zero. 

Suppose that the nxn matrix T has n distinct eigenvalues Ai , A2, . . . , An. 
Then C"^ has a basis made of the associated eigenvectors (Ci > • • • > Cn)- For any 
V G C"', writing v = Ci Ci + ■ • • + CnCn and iterating T on v gives these. 



Tv = ClAiCi +C2A2C2 H hCnAnCn 

T^^^ClAfCl + C2A|C2 + ---+CnA^Cn 
T^V^ClAfCl + C2A|C2 + ---+CnA^Cn 



T'^v^ciA'i'Ci + C2A^C2 + --- + CnA|^4 
Assuming that jAi | is the largest and dividing through 
T'^v A'^ a'' 

^=C,Cl+C2^C2 + --- + C.^Cn 

shows that as k gets larger the fractions go to zero and so Ai 's term will dominate 
the expression. Thus, the entire expression has a limit of Ci Ci • 

Thus if Ci 7^ 0, as k increases the vectors T'^v will tend toward the direction 
of the eigenvectors associated with the dominant eigenvalue, and consequently 
the ratios || T'^v ||/|| T'^^^v|| will tend toward that dominant eigenvalue. 

For example the eigenvalues of the matrix 



T 




are 3 and — 1 . If for instance v has the components 1 and 1 then 
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V I Tv T^v ■■■ T^^? V°v 

l\ (3\ f 9\ ... (]9683\ ( 59049\ 
lj l^zj \\7j 1^39367^ 1^118097^ 

and the ratio between the lengths of the last two is 2.999 9. 

We shall note two implementation issues. First, instead of finding the powers 
of T and applying them to v, we will compute vi as Tv and then compute V2 as 
Tvi , etc. (that is, we do not separately calculate T^, T^, . . . ). We can quickly 
do these matrix-vector products even if T is large, provided that it is sparse. 
The second issue is that to avoid generating numbers that are so large that they 
overflow our computer's capability, we can normalize the V|'s at each step. For 
instance, we can divide each vt by its length (other possibilities are to divide it 
by its largest component, or simply by its first component). We thus implement 
this method by generating 

Wo = Vo/||vo|| 

Vl = Two 
Wl =vi/||vi|| 

V2 — TW2 

Wk-1 =Vk-l/||Vk-l|| 

Vk = Twk 

until we are satisfied. Then Vk is an approximation of an eigenvector, and the 
approximation of the dominant eigenvalue is the ratio ||vk||/||wk-i ||. 

One way that we could be 'satisfied' is to iterate until our approximation of 
the eigenvalue settles down. We could decide for instance to stop the iteration 
process not after some fixed number of steps, but instead when ||vk|| differs from 
||vk-i II by less than one percent, or when they agree up to the second significant 
digit. 

The rate of convergence is determined by the rate at which the powers of 
I'^i/Ai I go to zero, where Ai is the eigenvalue of second largest norm. If that 
ratio is much less than one then convergence is fast but if it is only slightly 
less than one then convergence can be quite slow. Consequently, the method of 
powers is not the most commonly used way of finding eigenvalues (although it is 
the simplest one, which is why it is here). Instead, there are a variety of methods 
that generally work by first replacing the given matrix T with another that is 
similar to it and so has the same eigenvalues, but is in some reduced form such 
as tridiagonal form, where the only nonzero entries are on the diagonal, or just 
above or below it. Then special techniques can find the eigenvalues. Once we 
know the eigenvalues then we can easily compute the eigenvectors of T. These 
other methods are outside of our scope. A good reference is [Goult, et al.] 
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(:i 


1 


3 


(b) 1 












Exercises 

1 Use ten iterations to estimate the largest eigenvalue of these matrices, starting 
from the vector with components 1 and 2. Compare the answer with the one 
obtained by solving the characteristic equation. 

") G I) (4 f 

2 Redo the prior exercise by iterating until \\v^\\ — l|v]^-i || has absolute value less 
than 0.01 At each step, normalize by dividing each vector by its length. How many 
iterations does it take? Are the answers significantly different? 

3 Use ten iterations to estimate the largest eigenvalue of these matrices, starting 
from the vector with components 1 , 2, and 3. Compare the answer with the one 
obtained by solving the characteristic equation. 

-1 2 2\ 
2 2 2 
. -3 -6 -6/ 

4 Redo the prior exercise by iterating until ||vic|| — Hvi^-i 1| has absolute value less 
than 0.01 . At each step, normalize by dividing each vector by its length. How 
many iterations does it take? Are the answers significantly different? 

5 What happens if C] = 0? That is, what happens if the initial vector does not to 
have any component in the direction of the relevant eigenvector? 

6 How can we adapt the method of powers to find the smallest eigenvalue? 

Computer Code 

This is the code for the computer algebra system Octave that did the calculation 
above. (It has been lightly edited to remove blank lines, etc.) 

>I=[3, 0; 
8, -1] 

1= 

3 

8 -1 
>vO=[l; 2] 
vO= 

1 
1 

>vl=I*vO 
vl= 

3 
7 

>v2=I*vl 
v2= 

9 
17 

>I9=I**9 
T9= 

19683 

39368 -1 
>T10=T**10 
110= 

59049 
118096 1 
>v9=T9*vO 
v9= 

19683 

39367 
>vlO=T10*vO 
vlO= 

59049 
118096 
>norm(vl0)/norm(v9) 
ans=2.9999 
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Remark. This does not use the full power of Octave; it has built-in functions to 
automatically apply sophisticated methods to find eigenvalues and eigenvectors. 



Stable Populations 



Imagine a reserve park with animals from a species that we are trying to protect. 
The park doesn't have a fence and so animals cross the boundary, both from 
the inside out and from the outside in. Every year, 10% of the animals from 
inside of the park leave and 1% of the animals from the outside find their way in. 
We can ask if there is a stable level: are there populations for the park and the 
rest of the world that will stay constant over time, with the number of animals 
leaving equal to the number of animals entering? 

Let Ptt^ be the year n population in the park and let r^i be the population in 
the rest of the world. 

Pn+l = .90pn + .01 Tn 

Tn+l = .10pn + .99rn 
We have this matrix equation. 




The population will be stable if Pn+i — Pn and T^^y — r^^ so that the matrix 
equation Vn+i — Tv^^^ becomes v = Tv. We are therefore looking for eigenvectors 
for T that are associated with the eigenvalue A = 1 . The equation — (AI— T)v = 
(I-T)v is 

/ 0.10 -0.0l\ /p\ /o\ 
1^-0.10 0.01 j 

which gives the eigenspace: vectors with the restriction that p — .1r. For example, 
if we start with a park population p = 10, 000 animals, so that the rest of the 
world has r = 1 00 000 animals then every year ten percent of those inside will 
leave the park (this is a thousand animals) , and every year one percent of those 
from the rest of the world will enter the park (also a thousand animals). It is 
stable, self-sustaining. 

Now imagine that we are trying to raise the total world population of this 
species. For instance we can try to have the world population grow at a regular 
rate of 1% per year. This would make the population level stable in some sense. 
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although it is a dynamic stability in contrast to the static population level of 
the A = 1 case. The equation Vn+^ = 1 .01 • = Tvn. leads to ((1 .01 1 — T)v = 0, 
which gives this system. 

/ O.n -0.01 \ /p 
1^-0.10 O.Ol] yr 

This matrix is nonsingular and so the only solution is p =0, r = 0. Thus there is 
no nontrivial initial population that would lead to a regular annual one percent 
growth rate in p and r. 

We can look for the rates that allow an initial population for the park that 
results in a steady growth behavior. We consider Av — Tv and solve for A. 








A -.9 .01 
.10 A -.99 



(A - .9)(A- .99) - (.10)(.01) = A^ - 1.89A + 



We already know that A = 1 is one solution of this characteristic equation and 
finding that the other eigenvalue is 0.89 is routine. Thus there are two ways to 
have a dynamically stable p and r (where the two grow at the same rate despite 
the leaky park boundaries): have a world population that is does not grow or 
shrink, and have a world population that shrinks by 11% every year. 

This is one way to look at eigenvalues and eigenvectors — they give a stable 
state for a system. If the eigenvalue is one then the system is static. If the 
eigenvalue isn't one then it is a dynamic stability, where the parts of the system 
grow or shrink together. 

Exercises 

1 For the park discussed above, what should be the initial park population in the 
case where the populations decline by 11 % every year? 

2 What will happen to the population of the park in the event of a growth in world 
population of 1% per year? Will it lag the world growth, or lead it? Assume 
that the initial park population is ten thousand, and the world population is one 
hundred thousand, and calculate over a ten year span. 

3 The park discussed above is partially fenced so that now, every year, only 5% of 
the animals from inside of the park leave (still, about 1 % of the animals from the 
outside find their way in). Under what conditions can the park maintain a stable 
population now? 

4 Suppose that a species of bird only lives in Canada, the United States, or in Mexico. 
Every year, 4% of the Canadian birds travel to the US, and 1% of them travel to 
Mexico. Every year, 6% of the US birds travel to Canada, and 4% go to Mexico. 
Prom Mexico, every year 10% travel to the US, and 0% go to Canada. 

(a) Give the transition matrix. 

(b) Is there a way for the three countries to have constant populations? 

(c) Find all stable situations. 



Page Ranking 



Imagine that you are looking for the best book on Linear Algebra. You probably 
would try a web search engine such as Google. These lists pages ranked by impor- 
tance. The ranking is defined, as Google's founders have said in [Brin & Page], 
that a page is important if other important pages link to it: "a page can have 
a high PageRank if there are many pages that point to it, or if there are some 
pages that point to it and have a high PageRank." But isn't that circular — 
how can they tell whether a page is important without first deciding on the 
important pages? With eigenvalues and eigenvectors. 

We will present a simplified version of the Page Rank algorithm. For that 
we will model the World Wide Web as a collection of pages connected by links. 
This diagram, from [Wills], shows the pages as circles, and the links as arrows; 
for instance, page pi has a link to page p2. 




The key idea is that pages that should be highly ranked if they are cited often 
by other pages. That is, we raise the importance of a page if it is linked-to 
from page pj . The increment depends on the importance of the linking page pj 
divided by how many out-links Qj are on that page. 

in-linking pages pj 

This matrix stores the information. 








1/3 




1 





1/3 








1 













1/3 
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The algorithm's inventors describe a way to think about that matrix. 



(0 





1/3 


l/4\ 


1 





1/3 


1/4 





1 





1/4 


\0 





1/3 


1/4/ 



PageRank can be thought of as a model of user behavior. We 
assume there is a 'random surfer' who is given a web page at random 
and keeps clicking on links, never hitting "back" . . . The probability 
that the random surfer visits a page is its PageRank. [Brin & Page] 

In the diagram, a surfer on page ps has a probability 1/3 of going next to each 
of the other pages. 

That leads us to the problem of page p4 . Many targets of links are dangling 
or sink links, without any outbound links. (For instance, a page may link to an 
image.) The simplest way to model what could happen next is to imagine that 
when the surfer gets to a page like this then they go to a next page entirely at 
random. 



H 



We will find vector 3 whose components are the importance rankings of each 
page 3(pi). With this notation, our requirements for the page rank are that 
H3 — 3. That is, we want an eigenvector of the matrix associated with the 
eigenvalue A = 1 . 

Here is Sage 's calculation of the eigenvectors (slightly edited to fit on the 
page). 

sage: H=matrix([[0,0,l/3,l/4] , [1,0,1/3,1/4], [0,1,0,1/4], [0,0,1/3,1/4]]) 
sage: H.eigenvectors_right() 
[CI, [ 

(1, 2, 9/4, 1) 
], 1), (0, [ 
(0, 1, 3, -4) 

], 1), (-0.3750000000000000? - 0.4389855730355308?*!, 

[(1, -0.1250000000000000? + 1.316956719106593?*!, 
-1.875000000000000? - 1.316956719106593?*!, U], 1), 
(-0.3750000000000000? + 0.4389855730355308?*!, 

[(1, -0.1250000000000000? - 1.316956719106593?*!, 

-1.875000000000000? + 1.316956719106593?*!, 1)], U] 

The eigenvector that Sage gives associated with the eigenvalue A = 1 is this. 

2 

9/4 

V 1 / 

Of course, there are many vectors in that eigenspace. To get a page rank number 
we normalize to length one. 

sage: v=vector([l, 2, 9/4, 1]) 
sage: v/v.norm() 

(4/177*sqrt(177), 8/177*sqrt(177) , 3/59*sqrt(177) , 4/177*sqrt(177)) 
sage: w=v/v.norm() 
sage: w.n() 

(0.300658411201132, 0.601316822402263, 0.676481425202546, 0.300658411201132) 
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So we rank the first and fourth pages as of equal importance. We rank the 
second and third pages as much more important than those, and about equal in 
importance as each other. 

We'll add one more refinement. We will allow the surfer to pick a new 
page at random even if they are not on a dangling page. Let this happen with 
probability a. 



G = a • 



(0 
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1/4\ 
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1/4 
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1/4 


\0 





1/3 


1/4/ 



[1 -al 



/1/4 


1/4 


1/4 


1/4\ 


1/4 


1/4 


1/4 


1/4 


1/4 


1/4 


1/4 


1/4 


VV4 


1/4 


1/4 


1/4/ 



This is the Google matrix. 

In practice cc is typically between 0.85 and 0.99. Here are the ranks for the 
four pages with a spread of a's. 



a 


0.85 


0.90 


0.95 


0.99 


Pi 


0.325 


0.317 


0.309 


0.302 


P2 


0.602 


0.602 


0.602 


0.601 


P3 


0.652 


0.661 


0.669 


0.675 


P4 


0.325 


0.317 


0.309 


0.302 



The details of the algorithms used by commercial search engines are se- 
cret, no doubt have many refinements, and also change frequently. But the 
inventors of Google were gracious enough to outline the basis for their work in 
[Brin & Page]. A more current source is [Wikipedia Google Page Rank]. Two 
additional excellent expositions are [Wills] and [Austin]. 

Exercises 

1 A square matrix is stochastic if the sum of the entries in each column is one. The 
Google matrix is computed by taking a combination G = a*H + (l — a)*Sof two 
stochastic matrices. Show that G must be stochastic. 

2 For this web of pages, the importance of each page should be equal. Verify it for 
a = 0.85. 

^ «■ (pT) 



T 

P3 



3 [Bryan & Leise] Give the importance ranking for this web of pages. 

©. :© 
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(a) Use a = 0.85. 

(b) Use a = 0.95. 

(c) Observe that while p3 is linked-to from all other pages, and therefore seems 
important, it is not the highest ranked page. What is the highest ranked page? 
Explcdn. 



Linear Recurrences 



In 1202 Leonardo of Pisa, known as Fibonacci, posed this problem. 

A certain man put a pair of rabbits in a place surrounded on all sides 
by a wall. How many pairs of rabbits can be produced from that 
pair in a year if it is supposed that every month each pair begets a 
new pair which from the second month on becomes productive? 

This moves past an elementary exponential growth model for populations to 
include the fact that there is an initial period where newborns are not fertile. 
However, it retains other simplifying assumptions such as that there is no 
gestation period and no mortality. 

To get next month's total number of pairs we add the number of pairs alive 
this month to the number of pairs that will be newly born next month. The 
latter number is the number of pairs of parents that will be productive next 
month, which is the number that next month will have been alive for at least 
two months, and that is the number that were alive last month. 

f (n + 1 ) = f (n) + f (n - 1 ) where f (0) = 0, f (1 ) = 1 

We call this a recurrence relation because f recurs in its own defining equation. 
With it we can answer Fibonacci's twelve-month question. 



month 





1 


2 


3 


4 


5 e 


) 7 


8 9 


10 


11 


12 


pairs 





1 


1 


2 


3 


5 e 


i 13 


21 34 


55 


89 


144 



The sequence of numbers defined by the above equation is the Fibonacci se- 
quence. We will give a formula to calculate f(n + 1) without having to first 
calculate f(n), f(rL— 1), etc. 

We can give the recurrence a matrix formulation. 

Writing T for the matrix and Vn for the vector with components f (n + 1 ) and 
f(Ti), we have that Vn — T^vq. The advantage of this formulation comes from 
diagonalizing T because then we have a fast way to compute its powers: if 
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T — PDP^^ then T"^ — PD^P^^ and the n-th power of the diagonal matrix D 
is the diagonal matrix whose entries are the n-th powers of the entries of D. 

The characteristic equation of T is — A — 1 . The quadratic formula gives 
its roots as (1 + V5]/2 and (1 — V5)/2. {Remark: these are sometimes called 
"golden ratios;" see [Falbo].) Diagonalizing gives this. 




Introducing the vectors and taking the n-th power, we have 





The calculation is ugly but not hard. 




v/5 V 2 y V5 \ 2 

We want the second component of that equation. 



f(n) 



1 

7! 





2\/5 




This formula finds the value of any member of the sequence without having to 
first find the intermediate values. Notice that (1 — V5]/2 w —0.618 has absolute 
value less than one and so its powers go to zero. Thus the formula giving f (n) 
is dominated by its first term. 

Although we have extended the elementary model of population growth by 
adding a delay period before the onset of fertility, we nonetheless still get a 
function that is asymptotically exponential. 

In general, a linear recurrence relation (or difference equation) has this 
form. 

f (n + 1 ) = anf (n) + Qn-i f (n - 1 ) + ■ ■ • + Qn-kf (n - k) 

This recurrence relation is homogeneous because there is no constant term, i.e, 
we can rewrite it into the form — — f (n + 1 ) + a^f (n) + Qn-i f (n — 1 ) + • • ■ + 
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in-kf(Ti- ~ !<■]• We say that this relation is of order k. The relation along with 
the initial conditions f(0), . . . , f(k) completely determines a sequence. For 
instance, the Fibonacci relation is of order two and, along with the two initial 
conditions f(0) = and f{1) = 1, it determines the Fibonacci sequence simply 
because we can compute any f(n) by first computing f(2), f(3), etc. We shall 
see how to use linear algebra to solve a linear recurrence relation — to find a 
formula that computes the n-th member of the sequence without having to first 
compute the values of the prior members. 

Let V be the set of functions with domain N = {0, 1,2, . . .}. (We shall use 
the codomain M but we could use others, such as C. Below we sometimes have 
domain {1,2, . . .} but it is not an important difference.) This is a vector space 
with the usual meaning for addition and scalar multiplication of functions, that 
the action of f + g is x H> f(x) + g[x) and the action of cf is x i-^ c • f (x). 

If we omit the initial conditions then there may be many functions satisfying 
a recurrence. For example, the function g whose first few values are g(0) — 1, 
g(1 ) —2, g(2) = 3, g(3) — 4, and g(4) —7 solves the Fibonacci relation without 
the Fibonacci initial conditions. 

Fix a relation and consider the subset S of functions satisfying the relation 
without initial conditions. We claim that it is a subspace of V. It is nonempty 
because the zero function is a solution. It is closed under addition since if f i 
and fa are solutions, then this holds. 

an+i(fi +f2)(n+1) + --- + an-k(fi +f2)(n-k) 

= (an+i f 1 (n + 1 ) H h Qn-kfi [n-k)] 

+ (an+if2(n + 1) H + an-kf2(n-k)) 

= 

It is also closed under scalar multiplication. 

an+i(rfi)(n + 1) H + an-k(rfi )(n - k) 

= r(an+ifi (n + 1) H + Qn-kfi (n-k)) 

= r-0 
= 

We can find the dimension of S. Consider this map from the set of functions S 
to the set of k-tall vectors. 

/f(0)\ 



f(i: 



Exercise 3 shows that this map is linear. Because any solution of the recurrence 
is uniquely determined by the k initial conditions, this map is one-to-one and 
onto. Thus it is an isomorphism and thus S has dimension k, the order of the 
recurrence. 
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So we can describe the set of solutions of our linear homogeneous recurrence 
relation of degree k (again, without any initial conditions) by taking linear 
combinations of a set having only k-many linearly independent functions. 

To produce those equations we give the recurrence f (rL+ 1 ) — anf(Ti) + • • • + 
in-kf (tt- — k) a matrix formulation. 







an -2 • 


• Cln-k+l 








1 














( ^^""^ \ 







1 









f(n-1) 


f(n) 








1 








\f(n-k+1)y 










1 


/ 







We want the characteristic function of the matrix, the determinant of A — AI 
where the above matrix is A. The pattern in the 2x2 case 



Qn - A Qn-l 

1 -A 



= A — an.A — a 



n-l 



and the 3x3 case 

'^Qn-A Qn-l Cl^-I 

1 -A 

^0 1 -A 



-A^ + QnA^ + Qn-l A + ttn-l 



leads us to expect (and Exercise 4 verifies) that this is the characteristic equation. 



an -A Qn-l an-2 

1 -A 

1 -A 

1 



= ±(-A' 



Cln-k+l Cln-k 




anA 



1 

k-1 



-A 

Qn-l A''"^ H h an-k+1 A + Qn-k) 



The ± is irrelevant to find the roots so we will drop it. We say that the 
polynomial is 'associated' with the recurrence relation. 

If —A'' + anA'^^^ + Qn-l + • • • + Un-k+i A + Qn-k has no repeated roots 
then the matrix is diagonalizable and we can, in theory, get a formula for f (n) 
as in the Fibonacci case. But because we know that the subspace of solutions 
has dimension k, we do not need to do the diagonalization calculation provided 
that we can exhibit k linearly independent functions satisfying the relation. 

Where ri , r2, . . . , are the distinct roots, consider the functions fr, (n) — 
through fr,^ (n) — of powers of those roots. Exercise 5 shows that each is a 
solution of the recurrence and that they form a linearly independent set. So if 
the roots t] , . . . , of the associated polynomial are distinct then any solution 
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of the relation has the form f (n) = Cir^^ + CitJ + • • • + c^rj^ for Ci , . . . , G M. 
(The case of repeated roots is similar but we won't cover it here; see any text on 
Discrete Mathematics.) 

Now we bring in the initial conditions. Use them to solve for Ci , . . . , Cn- For 
instance, the polynomial associated with the Fibonacci relation is — + A + 1 , 
whose roots are (1 ± \/5)/2 and so any solution of the Fibonacci equation has the 
form f(n) = Ci ((1 + V5)/2)'^ + C2((l - V5)/2)'^. Including the initial conditions 
for the cases n = and n — 1 gives 

Cl + C2 =0 

(1 +V5/2)ci +(1 -V5/2)C2 = 1 

which yields Ci = ^/V5 and Cz — — 1/\/5, as we found above. 

We close by considering the nonhomogeneous case, where the relation has 
the form f (n + 1 ) = an,f (n.) + an-i f (n. — 1 ) + • • • + Qn-kf (n- — k) + b for some 
nonzero b. We only need a small adjustment to make the transition from the 
homogeneous case. This classic example illustrates. 

In 1883, Edouard Lucas posed the following problem, today called the Tower 
of Hanoi . 

In the great temple at Benares, beneath the dome which marks 
the center of the world, rests a brass plate in which are fixed three 
diamond needles, each a cubit high and as thick as the body of a 
bee. On one of these needles, at the creation, God placed sixty four 
disks of pure gold, the largest disk resting on the brass plate, and 
the others getting smaller and smaller up to the top one. This is the 
Tower of Brahma. Day and night unceasingly the priests transfer 
the disks from one diamond needle to another according to the fixed 
and immutable laws of Bram-ah, which require that the priest on 
duty must not move more than one disk at a time and that he must 
place this disk on a needle so that there is no smaller disk below 
it. When the sixty- four disks shall have been thus transferred from 
the needle on which at the creation God placed them to one of the 
other needles, tower, temple, and Brahmins alike will crumble into 
dusk, and with a thunderclap the world will vanish. (Translation of 
[De Parville] from [Ball & Coxeter].) 

How many disk moves will it take? Instead of tackling the sixty four disk problem 
right away, we will consider the problem for smaller numbers of disks, starting 
with three. 

To begin, all three disks are on the same needle. 



Topic: Linear Recurrences 



425 



After moving the small disk to the far needle, the mid-sized disk to the middle 
needle, and then moving the small disk to the middle needle we have this. 




Now we can move the big disk over. Then to finish we repeat the process of 
moving the two smaller disks, this time so that they end up on the third needle, 
on top of the big disk. 

So to move the bottom disk at a minimum we must first move the smaller 
disks to the middle needle, then move the big one, and then move all the smaller 
ones from the middle needle to the ending needle. Since this minimum suffices, 
we get this recurrence for the number of moves. 

T(n + 1 ) = T(n) + 1 + T(n) = 2T(n) + 1 where T(1 ) = 1 

We can easily compute the first few values of T. 



n 


1 2 


3 


4 


5 


6 7 8 


9 


10 


T(n) 


1 3 


7 


15 


31 


63 127 255 


511 


1023 



Of course, those numbers are one less than a power of two. To derive this 
equation instead of just guessing at it, we write the original relation as — 1 = 
— T(n + 1 ) + 2T(n), consider the homogeneous relation = — T(n) + 2T(n — 1 ), 
get its associated polynomial —A + 2, which obviously has the single root ti — 2, 
and conclude that functions satisfying the homogeneous relation take the form 
T(n) = 0:2"^. 

That's the homogeneous solution. Now we need a particular solution. Because 
the nonhomogeneous relation —1 — — T(n + 1) + 2T(n) is so simple, in a few 
minutes (or by remembering the table) we can spot a particular solution, T(rL] — 
— 1. So we have that (without yet considering the initial condition) any solution 
of T(rL + 1) = 2T(rL) + 1 is the sum of the homogeneous solution and this 
particular solution: T(n) = Ci2"^ — 1. The initial condition T(l) = 1 now gives 
that Ci = 1 , and we've gotten the formula that generates the table: the n-disk 
Tower of Hanoi problem requires a minimum of 2^ — 1 moves. 

Finding a particular solution in more complicated cases is, unsurprisingly, 
more complicated. A delightful and rewarding, but challenging, source on 
recurrence relations is [Graham, Knuth, Patashnik]. For more on the Tower 
of Hanoi, [Ball & Coxeter] or [Gardner 1957] are good starting points. So is 
[Hofstadter] . Some computer code for trying some recurrence relations follows 
the exercises. 

Exercises 

1 Solve each homogeneous linear recurrence relations. 

(a) f(n+1) =5f(n) -6f(n-1) 
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(b) f(n + l) =4f(n-l) 

(c) f (n + 1 ) = 5f (n) - 2f (n - 1 ] - 8f (n - 2) 

2 Give a formula for the relations of the prior exercise, with these initial condi- 
tions. 

(a) f(0) = l,f(l) = l 

(b) f(0)=0, f(l) = 1 

(c) f{0) = 1,f(l) = l,f(2)=3. 

3 Check that the isomorphism given between S and K"^ is a linear map. We argue 
above that this map is one-to-one. What is its inverse? 

4 Show that the characteristic equation of the matrix is as stated, that is, is the 
polynomial associated with the relation. (Hint: expanding down the final column, 
and using induction will work.) 

5 Given a homogeneous linear recurrence relation f{n+1 ) = anf(Tt) + - • • + an-kf(T^.— 
k) , let rj , . . . , be the roots of the associated polynomial. 

(a) Prove that each function fr^ (n) = rj^ satisfies the recurrence (without initial 
conditions). 

(b) Prove that no ri is 0. 

(c) Prove that the set {f, , , . . . , fr,^ } is linearly independent. 

6 (This refers to the value T(64) = 18,446,744,073,709,551,615 given in the com- 
puter code below.) Transferring one disk per second, how many years would it 
take the priests at the Tower of Hanoi to finish the job? 

Computer Code 

This code allows the generation of the first few values of a function defined 
by a recurrence and initial conditions. It is in the Scheme dialect of LISP 
(specifically, it shows A. Jaffer's free scheme interpreter SCM although any 
Scheme implementation should work). 

First, the Tower of Hanoi code is a straightforward implementation of the 
recurrence. 

(define (tower-of-hanoi-moves n) 
(if (= n 1) 
1 

(+ O-- (tower-of-hanoi-moves (- n 1)) 
2) 

1) ) ) 

(Note for readers unused to recursive code: to compute T(64), the computer 
wants to compute 2 * T(63) — 1 , which requires computing 7(63). The computer 
puts the 'times 2' and the 'plus T aside for a moment to do that. It computes 
T{63) by using this same piece of code (that's what 'recursive' means), and to 
do that it wants to compute 2 * T(62) — 1 . This keeps up (the next step is to try 
to do T(62) while it holds the other arithmetic in waiting), until after 63 steps 
the computer tries to compute T(l). It then returns 7(1) = 1, which allows 
the computation of 7(2) can proceed, etc., up until the original computation of 
7(64) finishes.) 

The next routine calculates a table of the first few values. (Some language 
notes: ' () is the empty list, that is, the empty sequence, and cons pushes 
something onto the start of a list. Note that, in the last line, the procedure proc 
is called on argument n.) 
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(define (first-few-outputs proc n) 

(first-few-outputs-helper proc n '()) ) 

(define (first-few-outputs-aux proc n 1st) 
(if « n 1) 
1st 

(first-few-outputs-aux proc (- n 1) (cons (proc n) 1st)) ) ) 

The session at the SCM prompt went like this. 

>(first-few-outputs tower-of-hanoi-moves 64) 
(1 3 7 15 31 63 127 255 511 1023 2047 4095 8191 16383 32767 
65535 131071 262143 524287 1048575 2097151 4194303 8388607 
16777215 33554431 67108863 134217727 268435455 536870911 
1073741823 2147483647 4294967295 8589934591 17179869183 
34359738367 68719476735 137438953471 274877906943 549755813887 
1099511627775 2199023255551 4398046511103 8796093022207 
17592186044415 35184372088831 70368744177663 140737488355327 
281474976710655 562949953421311 1125899906842623 
2251799813685247 4503599627370495 9007199254740991 
18014398509481983 36028797018963967 72057594037927935 
144115188075855871 288230376151711743 576460752303423487 
1152921504606846975 2305843009213693951 4611686018427387903 
9223372036854775807 18446744073709551615) 



This is a list of T(1 ] through 1(64). 
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Mathematics is made of arguments (reasoned discourse that is, not crockery- 
throwing). This section sketches the background material and argument tech- 
niques that we use in the book. 

This section only outlines these topics, giving an example or two and skiping 
proofs. For more, these are classics: [Polya], [Quine], and [Halmos74]. [Beck] is 
a recent book available online. 

Propositions 

The point at issue in an argument is the proposition. Mathematicians usually 
write the point in full before the proof and label it either Theorem for major 
points. Corollary for points that follow immediately from a prior one, or Lemma 
for when it is chiefly used to prove other results. 

The statements expressing propositions can be complex, with many subparts. 
The truth or falsity of the entire proposition depends both on the truth value of 
the parts and on how the statement is put together. 

Not Where P is a proposition, 'it is not the case that P' is true provided that 
P is false. Thus, 'n is not prime' is true only when n is the product of smaller 
integers. 

So 'not' operates on statements, inverting their truth value. We can picture 
it with a Venn diagram. 



Where the box encloses all natural numbers, and inside the circle are the primes, 
the shaded area holds numbers satisfying 'not P'. 

To prove that a 'not P' statement holds, show that P is false. 

And Consider the statement form 'P and Q'. For it to be true both halves must 
hold: '7 is prime and so is 3' is true, while '7 is prime and 3 is not' is false. 
Here is the Venn diagram for 'P and Q'. 
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To prove 'P and Q', prove that each half holds. 

Or A 'P or Q' statement is true when either half holds: '7 is prime or 4 is prime' 
is true, while '7 is not prime or 4 is prime' is false. We take 'or' to mean that if 
both halves are true '7 is prime or 4 is not' then the statement as a whole is 
true. (This is inclusive or. Occasionally in everyday speech people use 'or' in an 
exclusive way — "Live free or die" does not intend both halves to hold — but we 
will not use 'or' in that way.) 

The Venn diagram includes all of both circles. 




To prove 'P or Q', show that in all cases at least one half holds (perhaps 
sometimes one half and sometimes the other, but always at least one). 

If-then An 'if P then Q' statement (sometimes stated as 'P implies Q' or 
'P => Q' or 'P is sufficient to give Q' or 'Q if P') is true unless P is true while 
Q is false. 

There is a fine point here — 'if P then Q' is true when P is false, no matter 
what value Q has: 'if 4 is prime then 7 is prime' and 'if 4 is prime then 7 is not' 
are both true statements. (They are vacuously true.) Further, 'if P then Q' is 
true when Q is true, no matter what value P has: 'if 4 is prime then 7 is prime' 
and 'if 4 is not prime then 7 is prime' are both true. 

We adopt this definition of implication because we want statements such as 
'if n is a perfect square then n is not prime' to be true no matter which number 
n appears in that statement. For instance, we want 'if 5 is a perfect square 
then 5 is not prime' to be true so we wamt that if both P and Q aire false then 
P Q is true. 

The diagram 




shows that Q holds whenever P does. Notice again that if P does not hold then 
Q may or may not be in force. 



A-3 



There are two main ways to establish an imphcation. The first way is 
direct: assume that P is true and use that assumption to prove Q. For instance, 
to show 'if a number is divisible by 5 then twice that number is divisible by 
10', assume that the number is 5n and deduce that 2(5n) = lOn. The second 
way is indirect: prove the contrapositive statement: 'if Q is false then P is false' 
(rephrased, 'Q can only be false when P is also false'). Thus to show 'if a number 
is prime then it is not a perfect square', we can argue that if it were a square 
p = then it could be factored p = n • n where n < p and so wouldn't be 
prime (p = or p = 1 don't give n < p but they are nonprime). 

Note two things about this statement form. 

First, an 'if P then Q' result can sometimes be improved by weakening P 

or strengthening Q. Thus, 'if a number is divisible by p^ then its square is 
cdso divisible by p^' could be upgraded either by relaxing its hypothesis: 'if a 
number is divisible by p then its square is divisible by p^', or by tightening its 
conclusion: 'if a number is divisible by p-^ then its square is divisible by p"*'. 

Second, after showing 'if P then Q' then a good next step is to look into 
whether there are cases where Q holds but P does not. The idea is to better 
luiderstand the relationship between P and Q with an eye toward strengthening 
the proposition. 

Equivalence An if-then statement cannot be improved when not only does P 
imply Q but also Q implies P. Some ways to say this are: 'P if and only if Q', 

'P iff Q', 'P and Q are logically equivalent', 'P is necessary and sufficient to give 
Q', 'P ■^=^ Q'. An example is 'a number is divisible by a prime if and only if 
that number squared is divisible by the prime squared'. 

The picture shows that P and Q hold in exactly the same cases. 



Although in simple arguments a chain like "P if and only if R, which holds if 
and only if S ..." may be practical, typically we show equivalence by showing 
the two halves 'if P then Q' and 'if Q then P' separately. 

Quantifiers 

Compare these statements about natural numbers: 'there is an x such that x 
is divisible by x^' is true, while 'for adl numbers x, that x is divisible by x^' is 
false. The 'there is' and 'for all' prefixes are quantifiers. 

For all The 'for all' prefix is the universal quantifier, symbolized V. 

In a sense the box we draw to border the Venn diagram shows the universal 
quantifier since it dilineates the universe of possible members. 
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To prove that a statement holds in all cases, show that it holds in each case. 
Thus to prove that 'every number divisible by p has its square divisible by p^', 
take a single number of the form pn and square it (pn]^ — p^n^. This is a 
"typical element" or "generic element" proof. 

In this kind of argument we must be careful not to assume properties for that 
element other than those in the hypothesis. Here is an example of a common 
wrong argument: "if n is divisible by a prime, say 2, so that n = 2k for some 
natural number k, then — (2k) ^ — AV} and the square of n is divisible by the 
square of the prime." That is an argument about the case p = 2 but it isn't a 
proof for general p. Contrast it with a correct one: "if n is divisible by a prime 
so that n — pk for some natural number k, then v? = (pk)^ = p^k^ and so the 
square of n is divisible by the square of the prime." 

There exists The 'there exists' prefix is the existential quantifier, symbolized 3. 

A Venn diagram of 'there is a number such that P' shows both that there 
can be more than one and also that not all numbers need satisfy P. 




We can prove an existence proposition by producing something satisfying 
the property: once, to settle the question of primality of 2 + 1 , Euler produced 
the divisor 641 [Sandifer]. But there are proofs showing that something exists 
without saying how to find it; Euclid's argument given in the next subsection 
shows there are infinitely many primes without giving a formula naming them. 
In general, while demonstrating existence is better than nothing, giving an 
example is better, and an exhaustive list of all instances is ideal. 

Finally, along with "Are there any?"we often ask "How many?" So the question 
of uniqueness often arises in conjunction with questions of existence. Many 
times the two arguments are simpler if separated, so note that just as proving 
something exists does not show it is unique, neither does proving something is 
unique show that it exists. (Obviously 'the natural number halfway between 
three and four' would be unique, but no such number exists.) 

Techniques of Proof 

Induction Many proofs are iterative, "Here's why the statement is true for the 
number 1 , it then follows for 2 and from there to 3 . . . ". These are proofs by 
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induction. Such a proof has two steps. In the base step the proposition is 
established for some first number, often or 1 . In the inductive step we show 
that if the proposition holds for numbers up to and including some k then holds 
for the next number k + 1 . 

Here is an example proving that 1 + 2 + 3 + -- -+ n = n(n + 1 )/2. 

For the base step we show that the formula holds when n = 1 . That's 
easy, the sum of the first 1 number does indeed equal 1(1+1 )/2. 

For the inductive step, assume that the formula holds for the numbers 
1 , 2, . . . , k with k ^ 1 . That is, assume all of these instainces of the formula. 

1 =1(1+1)/2 
and 1+2 = 2(2 + 1)/2 
and 1+2 + 3 = 3(3 + 1)72 

and 1 +--- + k = k(k+1)/2 

This is the induction hypothesis. With this assumption we will deduce 
that the formula also holds in the k + 1 next case. 

1 , M 1^ k(k+1) (k+1)(k + 2) 
1+2+---+k+(k+1) = — - + (k+1) = ^ J - 

(The first equality follows from the induction hypothesis.) 

We've shown in the base case that the proposition holds for 1 . We've shown 
in the inductive step that if it holds for the case of 1 then it also holds for 2; 
therefore it does hold for 2. We've also shown in the inductive step that if the 
statement holds for the cases of 1 and 2 then it also holds for the next case 3. 
Continuing in this way, we get that the statement holds for any natural number 
greater than or equal to 1 . 

Here is another example, proving proof that every integer greater than 1 is a 
product of primes. 

The bcise step is easy: 2 is the product of a single prime. 

For the inductive step assume that each of 2, 3, . . . , k is a product of primes, 

aiming to show k+1 is also a product of primes. There are two possibilities. 
First, if k + 1 is not divisible by a number smaller than itself then it is 
a prime and so is the product of primes. The second possibility is that 
k + 1 is divisible by a number smaller than itself, and then its factors can 
be written as a product of primes by the inductive hypothesis. In either 
case k + 1 can be rewritten as a product of primes. 

There are two things to note about the 'next number' in an induction 
argument. One thing is that while induction works on the integers, it's no good 
on the reals since there is no 'next' real. The other thing is that we sometimes 
use induction to go down, say, from 10 to 9 to 8, etc., down to 0. So 'next 
number' could mean 'next lowest number'. Of course, at the end we have not 
shown the fact for all natural numbers, only for those less than or equal to 1 0. 
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Contradiction Another technique of proof is to show that something is true by- 
showing that it cannot be false. 

The classic example of proof by contradiction is Euclid's argument that there 
are infinitely many primes. 

Suppose there are only finitely many primes pi,...,pic. Consider pi • 
P2 . . . Pk + 1 . None of the primes on this supposedly exhaustive list divides 
that number evenly since each leaves a remainder of 1 . But every number 
is a product of primes so this can't be. Therefore there cannot be only 
finitely many primes. 

Every proof by contradiction assumes that the proposition is false and derives 
some contradiction to known facts. Another example is this proof that \fi is 
not a rational number. 

Suppose that -^/I = ra/n. 

Factor out any 2's, giving n = 2''"^ • ft and m = 2''"^ • fh. Rewrite. 

2 • (2"- • fif = (2"'- • m)^ 

The Prime Factorization Theorem says that there must be the same 
number of factors of 2 on both sides, but there are an odd number of them 
1 + 2kn on the left and an even number of them 2kn^ on the right. That's 
a contradiction, so a rational with a squcire of 2 is impossible. 

Both of these examples aimed to prove something doesn't exist. A negative 
proposition often suggests a proof by contradiction. 

Sets, Functions, and Relations 

Sets Mathematicians often work with collections. 

The most basic kind of collection is a set. We can describe a set as a listing 
between curly braces as with { 1 , 4, 9, 1 6} or by using set-builder notation as with 
{x I — 3x^ +2 = 0} (read "the set of all x such that ..."). We name sets with 
capital roman letters, for instance the set of primes is P = {2, 3, 5, 7, 11 , . . . } 
(except that a few sets are so important that their names are reserved, such as 
the real numbers M and the complex numbers C). To denote that something 
is an element (or a member) of a set we use 'e', so that 7 e {3,5,7} while 
8 ^{3,5,7}. 

Sets satisfy the Principle of Extensionality, that two sets with the same 
elements are equal. Because of this, the order of the elements does not matter 
{2, 7t} = {tt, 2}, and repeats collapse {7, 7} = {7}. 

We say that A is a subset of B, written AC B, if any element of A is an 
element of B. We use 'c' for the proper subset relationship that A is a subset 
of B but A 7^ B. An example is {2, n} c {2, tt, 7}. These symbols may be flipped, 
for instance {2, tt, 5} D {2, 5}. 
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Because of Extensionality, to prove that two sets are equal A = B show that 
they have the same members. Usually we show mutual inclusion, that both 
A C B and A D B. 

When a sets has no members then it is the empty set {}, symbolized 0. 
Any set has the empty set for a subset by the 'vacuously true' property of the 
definition of impUcation. 

Set operations Venn diagrams are handy here. For instance, we can picture 
X e P 



and 'P C Q'. 




This is a repeat of the diagram for 'if . . . then ..." because 'P C Q' means 'if 
X e P then X e Q'. 

For every propositional logic operator there is an associated set operator. 
The complement of P is P''°°'p = {x I not(x € P)} 




the union is P U Q = {x | (x e P) or (x G Q)} 




and the intersection is P n Q = {x | (x e P) and (x e Q)}. 
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Multisets A multiset is a collection in which order does not matter, just as 
with sets, but in contrast with sets repeats do not collapse. Thus the multiset 
{2, 1 , 2} is the same as the multiset { 1 , 2, 2} but differs from the multiset {1,2}. 
Note that we use the same {. . .} notation as for sets. Also as with sets, we say 
A is a multiset subset if A is a subset of B and A is a multiset. 

Sequences In addition to sets and multisets, we also use collections where order 
matters and where repeats do not collapse. These are sequences, denoted with 
angle brackets: (2,3, 7) ^ (2, 7,3). A sequence of length 2 is an ordered pair, 
and is often written with parentheses: (tt, 3). We also sometimes say 'ordered 
triple', 'ordered 4-tuple', etc. The set of ordered n-tuples of elements of a set A 
is denoted A'^. Thus is the set of pairs of reals. 

Functions When we first learn about functions they are presented as formulas 
such as f(x) = 16x^ — 100. But progressing to more advanced mathematics 
reveals more general functions — trigonometric ones, exponential and logarithmic 
ones, and even constructs like absolute value that involve piecing together parts. 
And some functions take inputs that are not numbers: the function that returns 
the distance from a point to the origin ■\/x^~+~y^ takes the ordered pair 
(x,y] as its argument. So we see that functions aren't formulas, instead the key 
idea is that a function associates with each input x a single output f(x). 

Consequently, a function or map is defined to be a set of ordered pairs 
(x, f (x)) such that x suffices to determine f (x). Restated, that is: if xi — xz then 
f (xi ) — f (x2) (this is the requirement that a function must be well-defined).* 

Each input x is one of the function's arguments. Each output f (x) is a value 
(often where x is the input the output is denoted y ) . The set of all arguments is 
f s domain and the set of output values is its range. Usually we don't need to 
know what is and is not in the range and we instead work with a convenient 
superset of the range, the codomain. The notation for a function f with domain 
X and codomain Y is f : X ^ Y. 



We also use the notation x i — > 16x^ — 100, read 'x maps under f to 16x^ — 100' 
or '16x^ — 100 is the image of x'. 

A map such as x i-^ sin(1/x) is a combinations of simple maps, here g(ij) — 
sin(y) applied to the image of f(x) — 1 /x. The composition of g: Y ^ Z with 
f : X ^ Y, is the map sending x e X to g( f(x) ) G Z. It is denoted g o f : X ^ Z. 
This definition only makes sense if the range of f is a subset of the domain of g . 

An identity map id: Y — > Y defined by id('y) = y has the property that for 
any f : X ^ Y, the composition id of is equal to f. So an identity map plays the 

*More on this is in the section on isomorphisms 
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same role with respect to function composition that the number plays in real 
number addition or that 1 plays in multiplication. 

In line with that analogy, we define a left inverse of a map f : X — >■ Y to be 
a function g: range (f) X such that g o f is the identity map on X. A right 
inverse of f is a h: Y — )■ X such that f o H is the identity. 

A map that is both a left and right inverse of f is called simply an inverse. 
An inverse, if one exists, is unique because if both gi and g2 are inverses of f 
then gi (x) = gi o (f o gi) (x) = (gi o f ) o g2 (x) — giM (the middle equality 
comes from the associativity of function composition), so we often call it "the" 
inverse, written . For instance, the inverse of the function f : M — )• M given 
by f (x) = 2x - 3 is the function f^^ : M M given by f"'' (x) = (x + 3)/2. 

The superscript 'f^^ ' notation for function inverse can be confusing since 
it clashes with 1/f(x]. But it fits into a larger scheme. Functions that have 
the same codomain as domain can be iterated, so that where f : X ^ X, we can 
consider the composition of f with itself: f o f , and f o f o f , etc. We write f o f as 

and f o f o f as f^, etc. Note that the familiar exponent rules for real numbers 
hold: f ^ o f ' = f^+' and (f^)' = i^'K Then where f is invertible, writing f^' for 
the inverse and for the inverse of f^, etc., gives that these familiar exponent 
rules continue to hold, once we define f° to be the identity map. 

If the codomain Y equals the range of f then we say that the function is onto. 
A function has a right inverse if and only if it is onto (this is not hard to check). 
If no two arguments share an image, if Xi ^ Xi implies that f (xi ) ^ f (X2), then 
the function is one-to-one. A function has a left inverse if and only if it is 
one-to-one (this is also not hard to check). 

By the prior paragraph, a map hais Ein inverse if and only if it is both onto 
and one-to-one. Such a function is a correspondence. It associates one and 
only one element of the domain with each element of the range. Because a 
composition of one-to-one maps is one-to-one, and a composition of onto maps 
is onto, a composition of correspondences is a correspondence. 

We sometimes want to shrink the domain of a function. For instance, we may 
take the function f : M ^ M given by f(x) = x^ and, in order to have an inverse, 
limit input arguments to nonnegative reals f : IR+ M. Then f is a different 
function than f; we call it the restriction of f to the smaller domain. 

Relations For some familiar operations we most naturally interpret them as 
functions: addition maps (5,3) to 8. But what of '<' or '='? We can take the 
approach of rephrasing '3 < 5' to '(3,5) is in the relation <'. That is, define a 
binary relation on a set A to be a set of ordered pairs of elements of A. For 
example, the < relation is the set {(a, b) | a < b}; some elements of that set are 
(3,5), (3,7), and (1,100). 

Another binary relation on the natural numbers is equality; this relation is 
the set {. . . , (—1,-1), (0,0), (1,1),.. .}. Still another example is 'closer than 10', 
the set {(x,!)) | |x — ijl < 10}. Some members of that relation are (1,10), (10,1), 
and (42,44). Neither (11,1) nor (1,11) is a member. 

Those examples illustrate the generality of the definition. All kinds of 
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relationships (e.g., 'both numbers even' or 'first number is the second with the 
digits reversed') are covered. 

Equivalence Relations We shall need to express that two objects are alike in 
some way. They aren't identical, but they are related (e.g., two integers that 
'give the same remainder when divided by 2'). 

A binary relation { (a, b ],...} is an equivalence relation when it satisfies 
(1) reflexivity: any object is related to itself, (2) symmetry: if a is related 
to b then b is related to a, and (3) transitivity: if a is related to b and b is 
related to c then a is related to c. Some examples (on the integers): '—' is an 
equivalence relation, '<' does not satisfy symmetry, 'same sign' is a equivalence, 
while 'nearer than 10' fails transitivity. 

Partitions In the 'same sign' relation {(1,3), (— 5, — 7), (— 1, — 1 ), . . .} there are 
two kinds of pairs, the ones with both numbers positive and those with both 
negative. So integers fall into exactly one of two classes, positive or negative. 

A partition of a set S is a collection of subsets {So, Si , Si, . . .} such that 
every element of S is in one and only one subset: Si U S2 U • • • = S, and if i ^ j 
then Si n Sj =0. Picture that S is decomposed into non-overlapping parts. 




Thus, the first paragraph says 'same sign' partitions the integers into the positives 
and the negatives. Similarly, the equivalence relation '=' partitions the integers 
into one-element sets. 

An example is the set of fractions S={n/d|u, deZ and d 7^ 0}. We define 
two members ni/di and riz/dz of S to be equivalent if ni da — nadi. We 
can check that this is an equivalence relation, that it satisfies the above three 
conditions. So S is partitioned. 



•1/1 / 

■2/2 y\ 


■2/4 J 
-2/ -4/ 


-0/1 








■4/3~\y 




•8/6\ 





Every equivalence relation induces a partition, and every partition is induced 
by an equivalence. (This is routine to check.) Below are two examples. 

Consider the equivalence relationship between two integers of 'give the 
same remainder when divided by 2', the set P = {(— 1,3), (2,4), (0, 0), . . .} 
(this is more briefly stated as 'same parity'). In the set P are two kinds 
of pairs, the ones with both members even and the ones with both mem- 
bers odd. This equivalence induces a partition where the parts are found by: 
for each x we deflne the set of numbers related to it Sx = {y | (x,y) e P}. 
Some parts are Si ={..., —3, —1,1,3,...}, and S2 ={..., —2,0,2,4, . . .}, and 
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S_i ={..., —3, —1,1,3,...}. Note that there are only two parts; for instance 
Si = S_i is the odd numbers and S2 = S4 is the evens. 

Now consider the partition of the natural numbers where two numbers are 
in the same part if they leave the same remainder when divided by 10, that is, 
if they have the same least significant digit. This partition is induced by the 
equivalence relation R defined by: two numbers n, m are related if they are 
together in the same part. The three conditions in the definition of equivalence 
are straightforward. For example, 3 is related to 33, but 3 is not related to 102. 

We call each part of a partition an equivalence class. We sometimes pick a 
single element of each equivalence class to be the class representative. 



Usually when we pick representatives we have some natural scheme in mind. In 
that case we call them the canonical representatives. 

An example is the simplest form of a fraction. The two fractions 3/5 and 
9/15 are equivalent. In everyday work we often prefer to use the 'simplest form' 
or 'reduced form' fraction 3/5 as the class representatives. 
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